A thread of thought within neural network learning research which tries to render the learning of prediction functions tractable, or comprehensible, by considering them as dynamical systems, and using the theory of stability in the context of Hamiltonians, optimal control and/or DE solvers, to make it all work.

I’ve been interested by this since seeing the (Haber and Ruthotto 2018) paper, but it’s got a kick from T. Q. Chen et al. (2018) won the prize at NeurIPS for directly learning the ODEs themselves, through related methods, which makes the whole thing look more useful.

Interesting connections here — we can also think about the relationships between stability and ergodicity, and criticality. Different: input-stability in learning.

## Convnets/Resnets as discrete PDE approximations

Arguing that neural networks are in the limit approximants to quadrature solutions of certain ODES can get a new perspective on how these things work, and also suggests certain ODE tricks might be imported. This is mostly what Haber and Rutthoto et al do. “Stability of training” is a useful outcome here, guaranteeing that gradient signals are available by ensuring the network preserves energy as the energy propagates through layers (Haber and Ruthotto 2018; Haber et al. 2017; Chang et al. 2018; Ruthotto and Haber 2018), which we can interpret as stability of the implied PDE approximator itself. They mean stability in the sense of energy-preserving operators or stability in linear systems.

Another fun trick from this toolbox is the ability to interpolate and discretize resnets, re-sampling the layers and weights themselves, by working out a net which solves the same discretized SDE. This essentially, AFAICT, allows one to upscale and downscale nets and/or the training data through their infinite-resolution limits. That sounds cool but I have not seen so much of it. Is the complexity in practice worth it?

## How much energy do I lose in other networks?

Also energy-conservation, but without the PDE structure implied, Wiatowski, Grohs, and Bölcskei (2018).
Or maybe a PDE structure is *still* implied but I am too thick to see it.

## Learning forward predictions with energy conservation

Slightly different again, AFAICT, because now we are thinking about predicting dynamics, rather than the dynamics of the neural network. The problem looks like it might be closely related, though, because we are still demanding an energy conservation of sorts between input and output.

## In reservoir computing

See reservoir computing.

## References

*Mathematical Programming Computation*11 (1): 1–36.

*Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48*, 1120–28. ICML’16. New York, NY, USA: JMLR.org.

*Proceedings of the National Academy of Sciences*111 (52): 18507–12.

*arXiv:1709.03698 [Cs, Stat]*.

*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc.

*arXiv:1511.05641 [Cs]*, November.

*Advances in Neural Information Processing Systems*. Vol. 33.

*Advances in Neural Information Processing Systems*. Vol. 33.

*Communications in Mathematics and Statistics*5 (1): 1–11.

*arXiv:1807.01083 [Cs, Math]*, July.

*arXiv:2104.05508 [Cs, Stat]*, April.

*International Conference on Machine Learning*, 2525–34. PMLR.

*arXiv:1805.08034 [Cs, Math]*, May.

*Inverse Problems*34 (1): 014004.

*arXiv:1703.02009 [Cs]*, March.

*Proceedings of the National Academy of Sciences*115 (34): 8505–10.

*arXiv:1509.01240 [Cs, Math, Stat]*, September.

*IMA Note*.

*PRoceedings of ICLR*.

*Advances in Neural Information Processing Systems*. Vol. 33.

*PMLR*, 1733–41.

*Advances in Neural Information Processing Systems*, 9.

*Advances in Neural Information Processing Systems*. Vol. 33.

*Proceedings of the 35th International Conference on Machine Learning*, 3208–16. PMLR.

*arXiv:2003.08063 [Cs, Math, Stat]*, March.

*arXiv:1609.08397 [Stat]*, 10:441–74.

*PMLR*, 2401–9.

*arXiv:1904.12933 [Quant-Ph, Stat]*, April.

*Advances in Neural Information Processing Systems*. Vol. 33.

*The Winnower*.

*arXiv:1812.01892 [Cs]*, December.

*arXiv:2106.10165 [Hep-Th, Stat]*, August.

*arXiv:1905.12090 [Cs, Stat]*, May.

*arXiv:1804.04272 [Cs, Math, Stat]*, April.

*arXiv:1910.09349 [Cs, Stat]*, March.

*CoRR*abs/2006.09313.

*PMLR*, 3570–78.

*arXiv:1805.08349 [Cond-Mat, Stat]*, October.

*Proceedings of IEEE International Symposium on Information Theory*.

*IEEE Transactions on Information Theory*64 (7): 1–1.

*arXiv:1905.10994 [Cs, Stat]*, October.

*Spatial Statistics*37 (June): 100408.

*arXiv:1907.12998 [Cs, Stat]*, February.

## No comments yet. Why not leave one?