Deep learning as a dynamical system

August 13, 2018 — October 30, 2022

calculus
classification
dynamical systems
geometry
Hilbert space
how do science
Lévy processes
machine learning
neural nets
PDEs
physics
regression
sciml
SDEs
signal processing
statistics
statmech
stochastic processes
Figure 1

A thread of thought within neural network learning research which tries to render the learning of prediction functions tractable, or comprehensible, by considering said networks as dynamical systems, which leads to some nice insights, and allows us to use the various tools of dynamical systems to analyse neural networks, which does some good stuff.

I’ve been interested by this since seeing the (Haber and Ruthotto 2018) paper, but it got a kick when T. Q. Chen et al. (2018) won the prize at NeurIPS for directly learning the ODEs themselves, through related methods, which makes the whole thing look more useful.

Interesting connections here — we can also think about the relationships between stability and ergodicity, and criticality. Different: input-stability in learning.

1 Convnets/Resnets as discrete PDE approximations

Figure 2

Arguing that neural networks are in the limit approximants to quadrature solutions of certain ODES can get a new perspective on how these things work, and also suggests certain ODE tricks might be imported. This is mostly what Haber and Rutthoto et al do. “Stability of training” is a useful outcome here, guaranteeing that gradient signals are available by ensuring the network preserves energy as the energy propagates through layers (Haber and Ruthotto 2018; Haber et al. 2017; Chang et al. 2018; Ruthotto and Haber 2020), which we can interpret as stability of the implied PDE approximator itself. They mean stability in the sense of energy-preserving operators or stability in linear systems.

Another fun trick from this toolbox is the ability to interpolate and discretize resnets, re-sampling the layers and weights themselves, by working out a net which solves the same discretized SDE. This essentially, AFAICT, allows one to upscale and downscale nets and/or the training data through their infinite-resolution limits. That sounds cool but I have not seen so much of it. Is the complexity in practice worth it?

Combining dept and time, we get Gu et al. (2021).

2 How much energy do I lose in other networks?

Also energy-conservation, but without the PDE structure implied, Wiatowski, Grohs, and Bölcskei (2018). Or maybe a PDE structure is still implied but I am too dense to see it.

3 Learning forward predictions with energy conservation

Slightly different again, AFAICT, because now we are thinking about predicting dynamics, rather than the dynamics of the neural network. The problem looks like it might be closely related, though, because we are still demanding an energy conservation of sorts between input and output.

4 This sounds like chaos theory

Measure preserving systems? complex dynamics? I cannot guarantee that chaotic dynamics are not involved; we should probably check that out. See edge of chaos.

5 Learning parameters by filtering

See Data assimilation for neural net training.

6 In reservoir computing

See reservoir computing.

7 Incoming

8 References

Andersson, Gillis, Horn, et al. 2019. CasADi: A Software Framework for Nonlinear Optimization and Optimal Control.” Mathematical Programming Computation.
Anil, Lucas, and Grosse. 2018. Sorting Out Lipschitz Function Approximation.”
Arjovsky, Shah, and Bengio. 2016. Unitary Evolution Recurrent Neural Networks.” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. ICML’16.
Babtie, Kirk, and Stumpf. 2014. Topological Sensitivity Analysis for Systems Biology.” Proceedings of the National Academy of Sciences.
Brandstetter, Berg, Welling, et al. 2022. Clifford Neural Layers for PDE Modeling.” In.
Brandstetter, Worrall, and Welling. 2022. Message Passing Neural PDE Solvers.” In International Conference on Learning Representations.
Bronstein. 2022. Beyond Message Passing: A Physics-Inspired Paradigm for Graph Neural Networks.” The Gradient.
Chandramoorthy, Loukas, Gatmiry, et al. 2022. On the Generalization of Learning Algorithms That Do Not Converge.”
Chang, Meng, Haber, et al. 2018. Reversible Architectures for Arbitrarily Deep Residual Neural Networks.” In arXiv:1709.03698 [Cs, Stat].
Chen, Chong, Dou, Chen, et al. 2022. A Novel Neural Network Training Framework with Data Assimilation.” The Journal of Supercomputing.
Chen, Tianqi, Goodfellow, and Shlens. 2015. Net2Net: Accelerating Learning via Knowledge Transfer.” arXiv:1511.05641 [Cs].
Chen, Tian Qi, Rubanova, Bettencourt, et al. 2018. Neural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems 31.
Choromanski, Davis, Likhosherstov, et al. 2020. An Ode to an ODE.” In Advances in Neural Information Processing Systems.
Chou, Rauhut, and Ward. 2023. Robust Implicit Regularization via Weight Normalization.”
Course, Evans, and Nair. 2020. Weak Form Generalized Hamiltonian Learning.” In Advances in Neural Information Processing Systems.
E. 2017. A Proposal on Machine Learning via Dynamical Systems.” Communications in Mathematics and Statistics.
———. 2021. The Dawning of a New Era in Applied Mathematics.” Notices of the American Mathematical Society.
E, Han, and Li. 2018. A Mean-Field Optimal Control Formulation of Deep Learning.” arXiv:1807.01083 [Cs, Math].
E, Ma, and Wu. 2020. Machine Learning from a Continuous Viewpoint, I.” Science China Mathematics.
Galimberti, Furieri, Xu, et al. 2021. Non Vanishing Gradients for Arbitrarily Deep Neural Networks: A Hamiltonian System Approach.” In.
Głuch, and Urbanke. 2021. Noether: The More Things Change, the More Stay the Same.” arXiv:2104.05508 [Cs, Stat].
Grohs, and Herrmann. 2022. Deep Neural Network Approximation for High-Dimensional Elliptic PDEs with Boundary Conditions.” IMA Journal of Numerical Analysis.
Gu, Johnson, Goel, et al. 2021. Combining Recurrent, Convolutional, and Continuous-Time Models with Linear State Space Layers.” In Advances in Neural Information Processing Systems.
Haber, Lensink, Treister, et al. 2019. IMEXnet A Forward Stable Deep Neural Network.” In International Conference on Machine Learning.
Haber, Lucka, and Ruthotto. 2018. Never Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.” arXiv:1805.08034 [Cs, Math].
Haber, and Ruthotto. 2018. Stable Architectures for Deep Neural Networks.” Inverse Problems.
Haber, Ruthotto, Holtham, et al. 2017. Learning Across Scales - A Multiscale Method for Convolution Neural Networks.” arXiv:1703.02009 [Cs].
Han, Jentzen, and E. 2018. Solving High-Dimensional Partial Differential Equations Using Deep Learning.” Proceedings of the National Academy of Sciences.
Hardt, Recht, and Singer. 2015. Train Faster, Generalize Better: Stability of Stochastic Gradient Descent.” arXiv:1509.01240 [Cs, Math, Stat].
Haro. 2008. Automatic Differentiation Methods in Computational Dynamical Systems: Invariant Manifolds and Normal Forms of Vector Fields at Fixed Points.” IMA Note.
Hayou, Doucet, and Rousseau. 2019. On the Impact of the Activation Function on Deep Neural Networks Training.” In Proceedings of the 36th International Conference on Machine Learning.
Hayou, Ton, Doucet, et al. 2020. Pruning Untrained Neural Networks: Principles and Analysis.” arXiv:2002.08797 [Cs, Stat].
He, Spokoyny, Neubig, et al. 2019. Lagging Inference Networks and Posterior Collapse in Variational Autoencoders.” In PRoceedings of ICLR.
Huh, Yang, Hwang, et al. 2020. Time-Reversal Symmetric ODE Network.” In Advances in Neural Information Processing Systems.
Jing, Shen, Dubcek, et al. 2017. Tunable Efficient Unitary Neural Networks (EUNN) and Their Application to RNNs.” In PMLR.
Kidger. 2022. On Neural Differential Equations.”
Kolter, and Manek. 2019. Learning Stable Deep Dynamics Models.” In Advances in Neural Information Processing Systems.
Kovachki, and Stuart. 2019. Ensemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.” Inverse Problems.
Lawrence, Loewen, Forbes, et al. 2020. Almost Surely Stable Deep Dynamics.” In Advances in Neural Information Processing Systems.
Long, Lu, Ma, et al. 2018. PDE-Net: Learning PDEs from Data.” In Proceedings of the 35th International Conference on Machine Learning.
Massaroli, Poli, Bin, et al. 2020. Stable Neural Flows.” arXiv:2003.08063 [Cs, Math, Stat].
Meng, Wang, Chen, et al. 2016. Generalization Error Bounds for Optimization Algorithms via Stability.” In arXiv:1609.08397 [Stat].
Mhammedi, Hellicar, Rahman, et al. 2017. Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections.” In PMLR.
Nguyen, and Malinsky. 2020. “Exploration and Implementation of Neural Ordinary Differential Equations.”
Niu, Horesh, and Chuang. 2019. Recurrent Neural Networks in the Eye of Differential Equations.” arXiv:1904.12933 [Quant-Ph, Stat].
Opschoor, Petersen, and Schwab. 2020. Deep ReLU Networks and High-Order Finite Element Methods.” Analysis and Applications.
Ott, Katiyar, Hennig, et al. 2020. ResNet After All: Neural ODEs and Their Numerical Solution.” In.
Poli, Massaroli, Yamashita, et al. 2020. Hypersolvers: Toward Fast Continuous-Depth Models.” In Advances in Neural Information Processing Systems.
Rackauckas. 2019. The Essential Tools of Scientific Machine Learning (Scientific ML).”
Rackauckas, Ma, Dixit, et al. 2018. A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions.” arXiv:1812.01892 [Cs].
Ray, Pinti, and Oberai. 2023. Deep Learning and Computational Physics (Lecture Notes).”
Roberts, Yaida, and Hanin. 2021. The Principles of Deep Learning Theory.” arXiv:2106.10165 [Hep-Th, Stat].
Roeder, Grant, Phillips, et al. 2019. Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems.” arXiv:1905.12090 [Cs, Stat].
Ruhe, Gupta, de Keninck, et al. 2023. Geometric Clifford Algebra Networks.” In arXiv Preprint arXiv:2302.06594.
Ruthotto, and Haber. 2020. Deep Neural Networks Motivated by Partial Differential Equations.” Journal of Mathematical Imaging and Vision.
Saemundsson, Terenin, Hofmann, et al. 2020. Variational Integrator Networks for Physically Structured Embeddings.” arXiv:1910.09349 [Cs, Stat].
Schoenholz, Gilmer, Ganguli, et al. 2017. Deep Information Propagation.” In.
Şimşekli, Sener, Deligiannidis, et al. 2020. Hausdorff Dimension, Stochastic Differential Equations, and Generalization in Neural Networks.” CoRR.
Venturi, and Li. 2022. The Mori-Zwanzig Formulation of Deep Learning.”
Vorontsov, Trabelsi, Kadoury, et al. 2017. On Orthogonality and Learning Recurrent Networks with Long Term Dependencies.” In PMLR.
Wang, Hu, and Lu. 2019. A Solvable High-Dimensional Model of GAN.” arXiv:1805.08349 [Cond-Mat, Stat].
Wiatowski, and Bölcskei. 2015. A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction.” In Proceedings of IEEE International Symposium on Information Theory.
Wiatowski, Grohs, and Bölcskei. 2018. Energy Propagation in Deep Convolutional Neural Networks.” IEEE Transactions on Information Theory.
Yegenoglu, Krajsek, Pier, et al. 2020. Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-Performing Gradient Descent.” In Machine Learning, Optimization, and Data Science.
Yıldız, Heinonen, and Lähdesmäki. 2019. ODE\(^2\)VAE: Deep Generative Second Order ODEs with Bayesian Neural Networks.” arXiv:1905.10994 [Cs, Stat].
Zammit-Mangion, and Wikle. 2020. Deep Integro-Difference Equation Models for Spatio-Temporal Forecasting.” Spatial Statistics.
Zhang, Gao, Unterman, et al. 2020. Approximation Capabilities of Neural ODEs and Invertible Residual Networks.” arXiv:1907.12998 [Cs, Stat].
Zhi, Lai, Ott, et al. 2022. Learning Efficient and Robust Ordinary Differential Equations via Invertible Neural Networks.” In International Conference on Machine Learning.