Neural nets with implicit layers

Also, declarative networks, bi-level optimization and other ingenious uses of the implicit function theorem



A unifying framework for various networks, including neural ODEs, where our layers are not simple forward operations but who exacluation is represented as some optimisation problem.

NB: This is different to the implicit representation method. Since implicit layers and implicit representation layers also occur in the same problems (such as ML PDES) this avoidable terminological confusion will haunt us.

To learn: connection to fixed point (Granas and Dugundji 2003) theory.

The implicit function theorem in learning

A beautiful explanation of what is special about differentiating systems at equilibrium is Blondel et al. (2021).

For further tutorial-form background, see the NeurIPS 2020 tutorial, Deep Implicit Layers - Neural ODEs, Deep Equilibrium Models, and Beyond, by Zico Kolter, David Duvenaud, and Matt Johnson or ADCME: Automatic Differentiation for Implicit Operators.

Optimization layers

Differentiable Convex Optimization Layers introduces cvxpylayers:

Optimization layers add domain-specific knowledge or learnable hard constraints to machine learning models.. Many of these layers solve convex and constrained optimization problems of the form

\[ \begin{array}{rl} x^{\star}(\theta)=\operatorname{arg min}_{x} & f(x ; \theta) \\ \text { subject to } g(x ; \theta) & \leq 0 \\ h(x ; \theta) & =0 \end{array} \]

with parameters ΞΈ, objective f, and constraint functions g,h and do end-to-end learning through them with respect to ΞΈ.

In this tutorial we introduce our new library cvxpylayers for easily creating differentiable new convex optimization layers. This lets you express your layer with the CVXPY domain specific language as usual and then export the CVXPY object to an efficient batched and differentiable layer with a single line of code. This project turns every convex optimization problem expressed in CVXPY into a differentiable layer.

Unrolling algorithms

The classic one is Gregor and LeCun (2010), and a number of others related to thsi idea intermittently appear (Adler and Γ–ktem 2018; Borgerding and Schniter 2016; Gregor and LeCun 2010; Sulam et al. 2020)

Deep declarative networks

A different terminology, although AFAICT closely related technology, is used by Stephen Gould in Gould, Hartley, and Campbell (2019), under the banner of Deep Declarative Networks. Fun applications he highlights: robust losses in pooling layers, projection onto shapes, convex programming and warping, matching problems, (relaxed) graph alignment, noisy point-cloud surface reconstruction… (I am sitting in his seminar as I write this.) They implemented a ddn library (pytorch).

To follow up from that presentation: Learning basis decomposition, hyperparameter optimisation… Stephen relates these to deep declarative nets by discussing both problems as β€œbi-level optimisation problems”. Also discusses some minimax-like optimisations to β€œStackelberg games” which are an optimisation problem embedded in game theory.

Deep equilibrium networks

Related: Deep equilibrium networks (Bai, Kolter, and Koltun 2019; Bai, Koltun, and Kolter 2020). In this one we assume that the network has a single layer which is iterated, and then solve for a fixed point of that iterated layer; this turns out to be memory efficient and in fact powerful (you need to scale up the width of that magic layer up to make it match the effective depth of a non-iterative layer stack, but not so very much.)

Example code: locuslab/deq.

Deep Ritz method

Fits here? (E, Han, and Jentzen 2017; E and Yu 2018; MΓΌller and Zeinhofer 2020) Or is it more of an nn-for-pdes thing?

In practice

In general we are using autodiff to find the gradients of our systems. Writing custom gradients to exploit the efficiencies of implicit gradients: how do we do that in practice?

Overriding autodiff is surprisingly easy in jax: Custom derivative rules for JAX-transformable Python functions, including implicit functions. Blondel et al. (2021) adds some extra conveniences in the form of google/jaxopt: Hardware accelerated, batchable and differentiable optimizers in JAX..

Julia autodiff also allows convenient overrides, and in fact the community discourse around them is full of helpful tips.

References

Adler, Jonas, and Ozan Γ–ktem. 2018. β€œLearned Primal-Dual Reconstruction.” IEEE Transactions on Medical Imaging 37 (6): 1322–32.
Agrawal, Akshay, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, and Zico Kolter. 2019. β€œDifferentiable Convex Optimization Layers.” In Advances In Neural Information Processing Systems.
Amos, Brandon, and J. Zico Kolter. 2017. β€œOptNet: Differentiable Optimization as a Layer in Neural Networks,” March.
Amos, Brandon, Ivan Dario Jimenez Rodriguez, Jacob Sacks, Byron Boots, and J. Zico Kolter. 2018. β€œDifferentiable MPC for End-to-End Planning and Control,” October.
Andersson, Joel A. E., Joris Gillis, Greg Horn, James B. Rawlings, and Moritz Diehl. 2019. β€œCasADi: A Software Framework for Nonlinear Optimization and Optimal Control.” Mathematical Programming Computation 11 (1): 1–36.
Arora, Sanjeev, Rong Ge, Tengyu Ma, and Ankur Moitra. 2015. β€œSimple, Efficient, and Neural Algorithms for Sparse Coding.” In Proceedings of The 28th Conference on Learning Theory, 40:113–49. Paris, France: PMLR.
Bai, Shaojie, J Zico Kolter, and Vladlen Koltun. 2019. β€œDeep Equilibrium Models.” In Advances in Neural Information Processing Systems, 32:12.
Bai, Shaojie, Vladlen Koltun, and J. Zico Kolter. 2020. β€œMultiscale Deep Equilibrium Models.” In Advances in Neural Information Processing Systems. Vol. 33.
β€”β€”β€”. 2021. β€œStabilizing Equilibrium Models by Jacobian Regularization.” arXiv:2106.14342 [Cs, Stat], June.
Banert, Sebastian, Jevgenija Rudzusika, Ozan Γ–ktem, and Jonas Adler. 2021. β€œAccelerated Forward-Backward Optimization Using Deep Learning.” arXiv:2105.05210 [Math], May.
Barratt, Shane. 2018. β€œOn the Differentiability of the Solution to Convex Optimization Problems,” April.
Blondel, Mathieu, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-LΓ³pez, Fabian Pedregosa, and Jean-Philippe Vert. 2021. β€œEfficient and Modular Implicit Differentiation.” arXiv:2105.15183 [Cs, Math, Stat], October.
Border, K C. 2019. β€œNotes on the Implicit Function Theorem.”
Borgerding, Mark, and Philip Schniter. 2016. β€œOnsager-Corrected Deep Networks for Sparse Linear Inverse Problems.” arXiv:1612.01183 [Cs, Math], December.
Djolonga, Josip, and Andreas Krause. 2017. β€œDifferentiable Learning of Submodular Models.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 1014–24. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.
Domke, Justin. 2012. β€œGeneric Methods for Optimization-Based Modeling.” In International Conference on Artificial Intelligence and Statistics, 318–26.
Donti, Priya L., Brandon Amos, and J. Zico Kolter. 2017. β€œTask-Based End-to-End Model Learning in Stochastic Optimization,” March.
E, Weinan, Jiequn Han, and Arnulf Jentzen. 2017. β€œDeep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations.” Communications in Mathematics and Statistics 5 (4): 349–80.
E, Weinan, and Bing Yu. 2018. β€œThe Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems.” Communications in Mathematics and Statistics 6 (1): 1–12.
G.Krantz, Steven, and Harold R.Parks. 2002. The Implicit Function Theorem. Springer.
Gould, Stephen, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, and Edison Guo. 2016. β€œOn Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-Level Optimization,” July.
Gould, Stephen, Richard Hartley, and Dylan Campbell. 2019. β€œDeep Declarative Networks: A New Hope,” September.
Granas, Andrzej, and James Dugundji. 2003. Fixed Point Theory. Springer Monographs in Mathematics. New York, NY: Springer New York.
Gregor, Karol, and Yann LeCun. 2010. β€œLearning fast approximations of sparse coding.” In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 399–406.
β€”β€”β€”. 2011. β€œEfficient Learning of Sparse Invariant Representations.” arXiv:1105.5307 [Cs], May.
Haber, Eldad, and Lars Ruthotto. 2018. β€œStable Architectures for Deep Neural Networks.” Inverse Problems 34 (1): 014004.
Huang, Zhichun, Shaojie Bai, and J. Zico Kolter. 2021. β€œ(Implicit)\(^2\): Implicit Layers for Implicit Representations.” In.
Landry, Benoit, Joseph Lorenzetti, Zachary Manchester, and Marco Pavone. 2019. β€œBilevel Optimization for Planning Through Contact: A Semidirect Method,” June.
Lee, Kwonjoon, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto. 2019. β€œMeta-Learning with Differentiable Convex Optimization,” April.
Mena, Gonzalo, David Belanger, Scott Linderman, and Jasper Snoek. 2018. β€œLearning Latent Permutations with Gumbel-Sinkhorn Networks,” February.
MΓΌller, Johannes, and Marius Zeinhofer. 2020. β€œDeep Ritz Revisited.” arXiv.
Poli, Michael, Stefano Massaroli, Atsushi Yamashita, Hajime Asama, and Jinkyoo Park. 2020. β€œHypersolvers: Toward Fast Continuous-Depth Models.” In Advances in Neural Information Processing Systems. Vol. 33.
Rajeswaran, Aravind, Chelsea Finn, Sham Kakade, and Sergey Levine. 2019. β€œMeta-Learning with Implicit Gradients,” September.
Sulam, Jeremias, Aviad Aberdam, Amir Beck, and Michael Elad. 2020. β€œOn Multi-Layer Basis Pursuit, Efficient Algorithms and Convolutional Neural Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (8): 1968–80.
Wang, Po-Wei, Priya L. Donti, Bryan Wilder, and Zico Kolter. 2019. β€œSATNet: Bridging Deep Learning and Logical Reasoning Using a Differentiable Satisfiability Solver,” May.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.