A thread of thought within neural network learning research which tries to render the learning of prediction functions tractable, or comprehensible, by considering said networks as dynamical systems, which leads to some nice insights, and allows us to use the various tools of dynamical systems to analyse neural networks, which does some good stuff.

Iβve been interested by this since seeing the (Haber and Ruthotto 2018) paper, but it got a kick when T. Q. Chen et al. (2018) won the prize at NeurIPS for directly learning the ODEs themselves, through related methods, which makes the whole thing look more useful.

Interesting connections here β we can also think about the relationships between stability and ergodicity, and criticality. Different: input-stability in learning.

## Convnets/Resnets as discrete PDE approximations

Arguing that neural networks are in the limit approximants to quadrature solutions of certain ODES can get a new perspective on how these things work, and also suggests certain ODE tricks might be imported. This is mostly what Haber and Rutthoto et al do. βStability of trainingβ is a useful outcome here, guaranteeing that gradient signals are available by ensuring the network preserves energy as the energy propagates through layers (Haber and Ruthotto 2018; Haber et al. 2017; Chang et al. 2018; Ruthotto and Haber 2020), which we can interpret as stability of the implied PDE approximator itself. They mean stability in the sense of energy-preserving operators or stability in linear systems.

Another fun trick from this toolbox is the ability to interpolate and discretize resnets, re-sampling the layers and weights themselves, by working out a net which solves the same discretized SDE. This essentially, AFAICT, allows one to upscale and downscale nets and/or the training data through their infinite-resolution limits. That sounds cool but I have not seen so much of it. Is the complexity in practice worth it?

Combining dept and time, we get Gu et al. (2021).

## How much energy do I lose in other networks?

Also energy-conservation, but without the PDE structure implied, Wiatowski, Grohs, and BΓΆlcskei (2018).
Or maybe a PDE structure is *still* implied but I am too dense to see it.

## Learning forward predictions with energy conservation

Slightly different again, AFAICT, because now we are thinking about predicting dynamics, rather than the dynamics of the neural network. The problem looks like it might be closely related, though, because we are still demanding an energy conservation of sorts between input and output.

## This sounds like chaos theory

Measure preserving systems? complex dynamics?
I cannot guarantee that chaotic dynamics are *not* involved; we should probably check that out.
See edge of chaos.

## Learning parameters by filtering

## In reservoir computing

See reservoir computing.

## References

*Mathematical Programming Computation*11 (1): 1β36.

*Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48*, 1120β28. ICMLβ16. New York, NY, USA: JMLR.org.

*Proceedings of the National Academy of Sciences*111 (52): 18507β12.

*arXiv:1709.03698 [Cs, Stat]*.

*The Journal of Supercomputing*, June.

*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572β83. Curran Associates, Inc.

*arXiv:1511.05641 [Cs]*, November.

*Advances in Neural Information Processing Systems*. Vol. 33.

*Advances in Neural Information Processing Systems*. Vol. 33.

*Communications in Mathematics and Statistics*5 (1): 1β11.

*Notices of the American Mathematical Society*68 (04): 1.

*arXiv:1807.01083 [Cs, Math]*, July.

*Science China Mathematics*63 (11): 2233β66.

*arXiv:2104.05508 [Cs, Stat]*, April.

*IMA Journal of Numerical Analysis*42 (3): 2055β82.

*Advances in Neural Information Processing Systems*, 34:572β85. Curran Associates, Inc.

*International Conference on Machine Learning*, 2525β34. PMLR.

*arXiv:1805.08034 [Cs, Math]*, May.

*Inverse Problems*34 (1): 014004.

*arXiv:1703.02009 [Cs]*, March.

*Proceedings of the National Academy of Sciences*115 (34): 8505β10.

*arXiv:1509.01240 [Cs, Math, Stat]*, September.

*IMA Note*.

*Proceedings of the 36th International Conference on Machine Learning*, 2672β80. PMLR.

*arXiv:2002.08797 [Cs, Stat]*, June.

*PRoceedings of ICLR*.

*Advances in Neural Information Processing Systems*. Vol. 33.

*PMLR*, 1733β41.

*Advances in Neural Information Processing Systems*, 9.

*Inverse Problems*35 (9): 095005.

*Advances in Neural Information Processing Systems*. Vol. 33.

*Proceedings of the 35th International Conference on Machine Learning*, 3208β16. PMLR.

*arXiv:2003.08063 [Cs, Math, Stat]*, March.

*arXiv:1609.08397 [Stat]*, 10:441β74.

*PMLR*, 2401β9.

*arXiv:1904.12933 [Quant-Ph, Stat]*, April.

*Analysis and Applications*18 (05): 715β70.

*Advances in Neural Information Processing Systems*. Vol. 33.

*arXiv:1812.01892 [Cs]*, December.

*arXiv:2106.10165 [Hep-Th, Stat]*, August.

*arXiv:1905.12090 [Cs, Stat]*, May.

*arXiv Preprint arXiv:2302.06594*.

*Journal of Mathematical Imaging and Vision*62 (3): 352β64.

*arXiv:1910.09349 [Cs, Stat]*, March.

*CoRR*abs/2006.09313.

*PMLR*, 3570β78.

*arXiv:1805.08349 [Cond-Mat, Stat]*, October.

*Proceedings of IEEE International Symposium on Information Theory*.

*IEEE Transactions on Information Theory*64 (7): 1β1.

*Machine Learning, Optimization, and Data Science*, edited by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Giorgio Jansen, Vincenzo Sciacca, Panos Pardalos, Giovanni Giuffrida, and Renato Umeton, 12566:78β92. Cham: Springer International Publishing.

*arXiv:1905.10994 [Cs, Stat]*, October.

*Spatial Statistics*37 (June): 100408.

*arXiv:1907.12998 [Cs, Stat]*, February.

*International Conference on Machine Learning*, 27060β74. PMLR.

## No comments yet. Why not leave one?