Deep learning as a dynamical system



A thread of thought within neural network learning research which tries to render the learning of prediction functions tractable, or comprehensible, by considering said networks as dynamical systems, which leads to some nice insights, and allows us to use the various tools of dynamical systems to analyse neural networks, which does some good stuff.

I’ve been interested by this since seeing the (Haber and Ruthotto 2018) paper, but it got a kick when T. Q. Chen et al. (2018) won the prize at NeurIPS for directly learning the ODEs themselves, through related methods, which makes the whole thing look more useful.

Interesting connections here β€” we can also think about the relationships between stability and ergodicity, and criticality. Different: input-stability in learning.

Convnets/Resnets as discrete PDE approximations

Arguing that neural networks are in the limit approximants to quadrature solutions of certain ODES can get a new perspective on how these things work, and also suggests certain ODE tricks might be imported. This is mostly what Haber and Rutthoto et al do. β€œStability of training” is a useful outcome here, guaranteeing that gradient signals are available by ensuring the network preserves energy as the energy propagates through layers (Haber and Ruthotto 2018; Haber et al. 2017; Chang et al. 2018; Ruthotto and Haber 2020), which we can interpret as stability of the implied PDE approximator itself. They mean stability in the sense of energy-preserving operators or stability in linear systems.

Another fun trick from this toolbox is the ability to interpolate and discretize resnets, re-sampling the layers and weights themselves, by working out a net which solves the same discretized SDE. This essentially, AFAICT, allows one to upscale and downscale nets and/or the training data through their infinite-resolution limits. That sounds cool but I have not seen so much of it. Is the complexity in practice worth it?

Combining dept and time, we get Gu et al. (2021).

Image: Donny Darko

How much energy do I lose in other networks?

Also energy-conservation, but without the PDE structure implied, Wiatowski, Grohs, and BΓΆlcskei (2018). Or maybe a PDE structure is still implied but I am too dense to see it.

Learning forward predictions with energy conservation

Slightly different again, AFAICT, because now we are thinking about predicting dynamics, rather than the dynamics of the neural network. The problem looks like it might be closely related, though, because we are still demanding an energy conservation of sorts between input and output.

This sounds like chaos theory

Measure preserving systems? complex dynamics? I cannot guarantee that chaotic dynamics are not involved; we should probably check that out. See edge of chaos.

Learning parameters by filtering

See Data assimilation for neural net training.

In reservoir computing

See reservoir computing.

References

Andersson, Joel A. E., Joris Gillis, Greg Horn, James B. Rawlings, and Moritz Diehl. 2019. β€œCasADi: A Software Framework for Nonlinear Optimization and Optimal Control.” Mathematical Programming Computation 11 (1): 1–36.
Anil, Cem, James Lucas, and Roger Grosse. 2018. β€œSorting Out Lipschitz Function Approximation,” November.
Arjovsky, Martin, Amar Shah, and Yoshua Bengio. 2016. β€œUnitary Evolution Recurrent Neural Networks.” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, 1120–28. ICML’16. New York, NY, USA: JMLR.org.
Babtie, Ann C., Paul Kirk, and Michael P. H. Stumpf. 2014. β€œTopological Sensitivity Analysis for Systems Biology.” Proceedings of the National Academy of Sciences 111 (52): 18507–12.
Brandstetter, Johannes, Rianne van den Berg, Max Welling, and Jayesh K. Gupta. 2022. β€œClifford Neural Layers for PDE Modeling.” In.
Chandramoorthy, Nisha, Andreas Loukas, Khashayar Gatmiry, and Stefanie Jegelka. 2022. β€œOn the Generalization of Learning Algorithms That Do Not Converge.” arXiv.
Chang, Bo, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, and Elliot Holtham. 2018. β€œReversible Architectures for Arbitrarily Deep Residual Neural Networks.” In arXiv:1709.03698 [Cs, Stat].
Chen, Chong, Yixuan Dou, Jie Chen, and Yaru Xue. 2022. β€œA Novel Neural Network Training Framework with Data Assimilation.” The Journal of Supercomputing, June.
Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. β€œNeural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc.
Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. 2015. β€œNet2Net: Accelerating Learning via Knowledge Transfer.” arXiv:1511.05641 [Cs], November.
Choromanski, Krzysztof, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, and Vikas Sindhwani. 2020. β€œAn Ode to an ODE.” In Advances in Neural Information Processing Systems. Vol. 33.
Chou, Hung-Hsu, Holger Rauhut, and Rachel Ward. 2023. β€œRobust Implicit Regularization via Weight Normalization.” arXiv.
Course, Kevin, Trefor Evans, and Prasanth Nair. 2020. β€œWeak Form Generalized Hamiltonian Learning.” In Advances in Neural Information Processing Systems. Vol. 33.
E, Weinan. 2017. β€œA Proposal on Machine Learning via Dynamical Systems.” Communications in Mathematics and Statistics 5 (1): 1–11.
β€”β€”β€”. 2021. β€œThe Dawning of a New Era in Applied Mathematics.” Notices of the American Mathematical Society 68 (04): 1.
E, Weinan, Jiequn Han, and Qianxiao Li. 2018. β€œA Mean-Field Optimal Control Formulation of Deep Learning.” arXiv:1807.01083 [Cs, Math], July.
E, Weinan, Chao Ma, and Lei Wu. 2020. β€œMachine Learning from a Continuous Viewpoint, I.” Science China Mathematics 63 (11): 2233–66.
Galimberti, Clara, Luca Furieri, Liang Xu, and Giancarlo Ferrari-Trecate. 2021. β€œNon Vanishing Gradients for Arbitrarily Deep Neural Networks: A Hamiltonian System Approach.” In.
GΕ‚uch, Grzegorz, and RΓΌdiger Urbanke. 2021. β€œNoether: The More Things Change, the More Stay the Same.” arXiv:2104.05508 [Cs, Stat], April.
Grohs, Philipp, and Lukas Herrmann. 2022. β€œDeep Neural Network Approximation for High-Dimensional Elliptic PDEs with Boundary Conditions.” IMA Journal of Numerical Analysis 42 (3): 2055–82.
Gu, Albert, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher RΓ©. 2021. β€œCombining Recurrent, Convolutional, and Continuous-Time Models with Linear State Space Layers.” In Advances in Neural Information Processing Systems, 34:572–85. Curran Associates, Inc.
Haber, Eldad, Keegan Lensink, Eran Treister, and Lars Ruthotto. 2019. β€œIMEXnet A Forward Stable Deep Neural Network.” In International Conference on Machine Learning, 2525–34. PMLR.
Haber, Eldad, Felix Lucka, and Lars Ruthotto. 2018. β€œNever Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.” arXiv:1805.08034 [Cs, Math], May.
Haber, Eldad, and Lars Ruthotto. 2018. β€œStable Architectures for Deep Neural Networks.” Inverse Problems 34 (1): 014004.
Haber, Eldad, Lars Ruthotto, Elliot Holtham, and Seong-Hwan Jun. 2017. β€œLearning Across Scales - A Multiscale Method for Convolution Neural Networks.” arXiv:1703.02009 [Cs], March.
Han, Jiequn, Arnulf Jentzen, and Weinan E. 2018. β€œSolving High-Dimensional Partial Differential Equations Using Deep Learning.” Proceedings of the National Academy of Sciences 115 (34): 8505–10.
Hardt, Moritz, Benjamin Recht, and Yoram Singer. 2015. β€œTrain Faster, Generalize Better: Stability of Stochastic Gradient Descent.” arXiv:1509.01240 [Cs, Math, Stat], September.
Haro, A. 2008. β€œAutomatic Differentiation Methods in Computational Dynamical Systems: Invariant Manifolds and Normal Forms of Vector Fields at Fixed Points.” IMA Note.
Hayou, Soufiane, Arnaud Doucet, and Judith Rousseau. 2019. β€œOn the Impact of the Activation Function on Deep Neural Networks Training.” In Proceedings of the 36th International Conference on Machine Learning, 2672–80. PMLR.
Hayou, Soufiane, Jean-Francois Ton, Arnaud Doucet, and Yee Whye Teh. 2020. β€œPruning Untrained Neural Networks: Principles and Analysis.” arXiv:2002.08797 [Cs, Stat], June.
He, Junxian, Daniel Spokoyny, Graham Neubig, and Taylor Berg-Kirkpatrick. 2019. β€œLagging Inference Networks and Posterior Collapse in Variational Autoencoders.” In PRoceedings of ICLR.
Huh, In, Eunho Yang, Sung Ju Hwang, and Jinwoo Shin. 2020. β€œTime-Reversal Symmetric ODE Network.” In Advances in Neural Information Processing Systems. Vol. 33.
Jing, Li, Yichen Shen, Tena Dubcek, John Peurifoy, Scott Skirlo, Yann LeCun, Max Tegmark, and Marin SoljačiΔ‡. 2017. β€œTunable Efficient Unitary Neural Networks (EUNN) and Their Application to RNNs.” In PMLR, 1733–41.
Kidger, Patrick. 2022. β€œOn Neural Differential Equations.” Oxford.
Kolter, J Zico, and Gaurav Manek. 2019. β€œLearning Stable Deep Dynamics Models.” In Advances in Neural Information Processing Systems, 9.
Kovachki, Nikola B., and Andrew M. Stuart. 2019. β€œEnsemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.” Inverse Problems 35 (9): 095005.
Lawrence, Nathan, Philip Loewen, Michael Forbes, Johan Backstrom, and Bhushan Gopaluni. 2020. β€œAlmost Surely Stable Deep Dynamics.” In Advances in Neural Information Processing Systems. Vol. 33.
Long, Zichao, Yiping Lu, Xianzhong Ma, and Bin Dong. 2018. β€œPDE-Net: Learning PDEs from Data.” In Proceedings of the 35th International Conference on Machine Learning, 3208–16. PMLR.
Massaroli, Stefano, Michael Poli, Michelangelo Bin, Jinkyoo Park, Atsushi Yamashita, and Hajime Asama. 2020. β€œStable Neural Flows.” arXiv:2003.08063 [Cs, Math, Stat], March.
Meng, Qi, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, and Tie-Yan Liu. 2016. β€œGeneralization Error Bounds for Optimization Algorithms via Stability.” In arXiv:1609.08397 [Stat], 10:441–74.
Mhammedi, Zakaria, Andrew Hellicar, Ashfaqur Rahman, and James Bailey. 2017. β€œEfficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections.” In PMLR, 2401–9.
Nguyen, Long, and Andy Malinsky. 2020. β€œExploration and Implementation of Neural Ordinary Differential Equations,” 34.
Niu, Murphy Yuezhen, Lior Horesh, and Isaac Chuang. 2019. β€œRecurrent Neural Networks in the Eye of Differential Equations.” arXiv:1904.12933 [Quant-Ph, Stat], April.
Opschoor, Joost A. A., Philipp C. Petersen, and Christoph Schwab. 2020. β€œDeep ReLU Networks and High-Order Finite Element Methods.” Analysis and Applications 18 (05): 715–70.
Poli, Michael, Stefano Massaroli, Atsushi Yamashita, Hajime Asama, and Jinkyoo Park. 2020. β€œHypersolvers: Toward Fast Continuous-Depth Models.” In Advances in Neural Information Processing Systems. Vol. 33.
Rackauckas, Christopher. 2019. β€œThe Essential Tools of Scientific Machine Learning (Scientific ML).”
Rackauckas, Christopher, Yingbo Ma, Vaibhav Dixit, Xingjian Guo, Mike Innes, Jarrett Revels, Joakim Nyberg, and Vijay Ivaturi. 2018. β€œA Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions.” arXiv:1812.01892 [Cs], December.
Ray, Deep, Orazio Pinti, and Assad A. Oberai. 2023. β€œDeep Learning and Computational Physics (Lecture Notes).”
Roberts, Daniel A., Sho Yaida, and Boris Hanin. 2021. β€œThe Principles of Deep Learning Theory.” arXiv:2106.10165 [Hep-Th, Stat], August.
Roeder, Geoffrey, Paul K. Grant, Andrew Phillips, Neil Dalchau, and Edward Meeds. 2019. β€œEfficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems.” arXiv:1905.12090 [Cs, Stat], May.
Ruhe, David, Jayesh K Gupta, Steven de Keninck, Max Welling, and Johannes Brandstetter. 2023. β€œGeometric Clifford Algebra Networks.” In arXiv Preprint arXiv:2302.06594.
Ruthotto, Lars, and Eldad Haber. 2020. β€œDeep Neural Networks Motivated by Partial Differential Equations.” Journal of Mathematical Imaging and Vision 62 (3): 352–64.
Saemundsson, Steindor, Alexander Terenin, Katja Hofmann, and Marc Peter Deisenroth. 2020. β€œVariational Integrator Networks for Physically Structured Embeddings.” arXiv:1910.09349 [Cs, Stat], March.
Schoenholz, Samuel S., Justin Gilmer, Surya Ganguli, and Jascha Sohl-Dickstein. 2017. β€œDeep Information Propagation.” In.
Şimşekli, Umut, Ozan Sener, George Deligiannidis, and Murat A. Erdogdu. 2020. β€œHausdorff Dimension, Stochastic Differential Equations, and Generalization in Neural Networks.” CoRR abs/2006.09313.
Venturi, Daniele, and Xiantao Li. 2022. β€œThe Mori-Zwanzig Formulation of Deep Learning.” arXiv.
Vorontsov, Eugene, Chiheb Trabelsi, Samuel Kadoury, and Chris Pal. 2017. β€œOn Orthogonality and Learning Recurrent Networks with Long Term Dependencies.” In PMLR, 3570–78.
Wang, Chuang, Hong Hu, and Yue M. Lu. 2019. β€œA Solvable High-Dimensional Model of GAN.” arXiv:1805.08349 [Cond-Mat, Stat], October.
Wiatowski, Thomas, and Helmut BΓΆlcskei. 2015. β€œA Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction.” In Proceedings of IEEE International Symposium on Information Theory.
Wiatowski, Thomas, Philipp Grohs, and Helmut BΓΆlcskei. 2018. β€œEnergy Propagation in Deep Convolutional Neural Networks.” IEEE Transactions on Information Theory 64 (7): 1–1.
Yegenoglu, Alper, Kai Krajsek, Sandra Diaz Pier, and Michael Herty. 2020. β€œEnsemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-Performing Gradient Descent.” In Machine Learning, Optimization, and Data Science, edited by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Giorgio Jansen, Vincenzo Sciacca, Panos Pardalos, Giovanni Giuffrida, and Renato Umeton, 12566:78–92. Cham: Springer International Publishing.
YΔ±ldΔ±z, Γ‡ağatay, Markus Heinonen, and Harri LΓ€hdesmΓ€ki. 2019. β€œODE\(^2\)VAE: Deep Generative Second Order ODEs with Bayesian Neural Networks.” arXiv:1905.10994 [Cs, Stat], October.
Zammit-Mangion, Andrew, and Christopher K. Wikle. 2020. β€œDeep Integro-Difference Equation Models for Spatio-Temporal Forecasting.” Spatial Statistics 37 (June): 100408.
Zhang, Han, Xi Gao, Jacob Unterman, and Tom Arodz. 2020. β€œApproximation Capabilities of Neural ODEs and Invertible Residual Networks.” arXiv:1907.12998 [Cs, Stat], February.
Zhi, Weiming, Tin Lai, Lionel Ott, Edwin V. Bonilla, and Fabio Ramos. 2022. β€œLearning Efficient and Robust Ordinary Differential Equations via Invertible Neural Networks.” In International Conference on Machine Learning, 27060–74. PMLR.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.