# Gradient flows

infinitesimal optimization

January 30, 2020 — September 28, 2023

Stochastic models of optimisation, especially stochastic gradience descent.

## 1 Ordinary

Gradient flows we can think of a continuous-limit of gradient descent. There is a (deterministic) ODE corresponding to an infinitesimal trainning rate.

## 2 Stochastic DE for early stage training

SGD as an SDE (Ljung, Pflug, and Walk 1992; Mandt, Hoffman, and Blei 2017). Worth the price of dusting off the old stochastic calculus. This is used for choosing scaling rules for model training, typically. (Q. Li, Tai, and Weinan 2019; Z. Li, Malladi, and Arora 2021; Malladi et al. 2022)

## 3 Stochastic DE around the optimum

The *limiting diffusion* describes diffusion around an optim, i.e. after we have converged. Interesting for understanding generalisation (Gu et al. 2022; Z. Li, Wang, and Arora 2021; Lyu, Li, and Arora 2023; Wang et al. 2023).

They have an interpretation in terms of sampling from a Bayes posterior: See Bayes by Backprop things.

## 4 References

*Modelling and Optimisation of Flows on Networks: Cetraro, Italy 2009, Editors: Benedetto Piccoli, Michel Rascle*. Lecture Notes in Mathematics.

*Gradient Flows: In Metric Spaces and in the Space of Probability Measures*. Lectures in Mathematics. ETH Zürich.

*Acta Numerica*.

*Proceedings of the 32nd International Conference on Neural Information Processing Systems*. NIPS’18.

*Entropy*.

*SIAM Journal on Applied Dynamical Systems*.

*Frontiers in Behavioral Neuroscience*.

*A Field Guide to Dynamical Recurrent Neural Networks*.

*Advances in Neural Information Processing Systems*.

*The Journal of Machine Learning Research*.

*Stochastic Approximation and Optimization of Random Systems*.

*Advances in Neural Information Processing Systems*.

*JMLR*.

*SIAM Journal on Numerical Analysis*.