# Deep learning as a dynamical system

August 13, 2018 — October 30, 2022

A thread of thought within neural network learning research which tries to render the learning of prediction functions tractable, or comprehensible, by considering said networks as dynamical systems, which leads to some nice insights, and allows us to use the various tools of dynamical systems to analyse neural networks, which does some good stuff.

I’ve been interested by this since seeing the (Haber and Ruthotto 2018) paper, but it got a kick when T. Q. Chen et al. (2018) won the prize at NeurIPS for directly learning the ODEs themselves, through related methods, which makes the whole thing look more useful.

Interesting connections here — we can also think about the relationships between stability and ergodicity, and criticality. Different: input-stability in learning.

## 1 Convnets/Resnets as discrete PDE approximations

Arguing that neural networks are in the limit approximants to quadrature solutions of certain ODES can get a new perspective on how these things work, and also suggests certain ODE tricks might be imported. This is mostly what Haber and Rutthoto et al do. “Stability of training” is a useful outcome here, guaranteeing that gradient signals are available by ensuring the network preserves energy as the energy propagates through layers (Haber and Ruthotto 2018; Haber et al. 2017; Chang et al. 2018; Ruthotto and Haber 2020), which we can interpret as stability of the implied PDE approximator itself. They mean stability in the sense of energy-preserving operators or stability in linear systems.

Another fun trick from this toolbox is the ability to interpolate and discretize resnets, re-sampling the layers and weights themselves, by working out a net which solves the same discretized SDE. This essentially, AFAICT, allows one to upscale and downscale nets and/or the training data through their infinite-resolution limits. That sounds cool but I have not seen so much of it. Is the complexity in practice worth it?

Combining dept and time, we get Gu et al. (2021).

## 2 How much energy do I lose in other networks?

Also energy-conservation, but without the PDE structure implied, Wiatowski, Grohs, and Bölcskei (2018). Or maybe a PDE structure is *still* implied but I am too dense to see it.

## 3 Learning forward predictions with energy conservation

Slightly different again, AFAICT, because now we are thinking about predicting dynamics, rather than the dynamics of the neural network. The problem looks like it might be closely related, though, because we are still demanding an energy conservation of sorts between input and output.

## 4 This sounds like chaos theory

Measure preserving systems? complex dynamics? I cannot guarantee that chaotic dynamics are *not* involved; we should probably check that out. See edge of chaos.

## 5 Learning parameters by filtering

## 6 In reservoir computing

See reservoir computing.

## 7 Incoming

## 8 References

*Mathematical Programming Computation*.

*Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48*. ICML’16.

*Proceedings of the National Academy of Sciences*.

*International Conference on Learning Representations*.

*The Gradient*.

*arXiv:1709.03698 [Cs, Stat]*.

*The Journal of Supercomputing*.

*arXiv:1511.05641 [Cs]*.

*Advances in Neural Information Processing Systems 31*.

*Advances in Neural Information Processing Systems*.

*Advances in Neural Information Processing Systems*.

*Communications in Mathematics and Statistics*.

*Notices of the American Mathematical Society*.

*arXiv:1807.01083 [Cs, Math]*.

*Science China Mathematics*.

*arXiv:2104.05508 [Cs, Stat]*.

*IMA Journal of Numerical Analysis*.

*Advances in Neural Information Processing Systems*.

*International Conference on Machine Learning*.

*arXiv:1805.08034 [Cs, Math]*.

*Inverse Problems*.

*arXiv:1703.02009 [Cs]*.

*Proceedings of the National Academy of Sciences*.

*arXiv:1509.01240 [Cs, Math, Stat]*.

*IMA Note*.

*Proceedings of the 36th International Conference on Machine Learning*.

*arXiv:2002.08797 [Cs, Stat]*.

*PRoceedings of ICLR*.

*Advances in Neural Information Processing Systems*.

*PMLR*.

*Advances in Neural Information Processing Systems*.

*Inverse Problems*.

*Advances in Neural Information Processing Systems*.

*Proceedings of the 35th International Conference on Machine Learning*.

*arXiv:2003.08063 [Cs, Math, Stat]*.

*arXiv:1609.08397 [Stat]*.

*PMLR*.

*arXiv:1904.12933 [Quant-Ph, Stat]*.

*Analysis and Applications*.

*Advances in Neural Information Processing Systems*.

*arXiv:1812.01892 [Cs]*.

*arXiv:2106.10165 [Hep-Th, Stat]*.

*arXiv:1905.12090 [Cs, Stat]*.

*arXiv Preprint arXiv:2302.06594*.

*Journal of Mathematical Imaging and Vision*.

*arXiv:1910.09349 [Cs, Stat]*.

*CoRR*.

*PMLR*.

*arXiv:1805.08349 [Cond-Mat, Stat]*.

*Proceedings of IEEE International Symposium on Information Theory*.

*IEEE Transactions on Information Theory*.

*Machine Learning, Optimization, and Data Science*.

*arXiv:1905.10994 [Cs, Stat]*.

*Spatial Statistics*.

*arXiv:1907.12998 [Cs, Stat]*.

*International Conference on Machine Learning*.