Ambrosio, Luigi, and Nicola Gigli. 2013.
“A User’s Guide to Optimal Transport.” In
Modelling and Optimisation of Flows on Networks: Cetraro, Italy 2009, Editors: Benedetto Piccoli, Michel Rascle, edited by Luigi Ambrosio, Alberto Bressan, Dirk Helbing, Axel Klar, and Enrique Zuazua, 1–155. Lecture Notes in Mathematics. Berlin, Heidelberg: Springer.
Ambrosio, Luigi, Nicola Gigli, and Giuseppe Savare. 2008.
Gradient Flows: In Metric Spaces and in the Space of Probability Measures. 2nd ed. Lectures in Mathematics. ETH Zürich. Birkhäuser Basel.
Bartlett, Peter L., Andrea Montanari, and Alexander Rakhlin. 2021.
“Deep Learning: A Statistical Viewpoint.” Acta Numerica 30 (May): 87–201.
Chizat, Lénaïc, and Francis Bach. 2018.
“On the Global Convergence of Gradient Descent for over-Parameterized Models Using Optimal Transport.” In
Proceedings of the 32nd International Conference on Neural Information Processing Systems, 3040–50. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.
Di Giovanni, Francesco, James Rowbottom, Benjamin P. Chamberlain, Thomas Markovich, and Michael M. Bronstein. 2022.
“Graph Neural Networks as Gradient Flows.” arXiv.
Garbuno-Inigo, Alfredo, Franca Hoffmann, Wuchen Li, and Andrew M. Stuart. 2020.
“Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler.” SIAM Journal on Applied Dynamical Systems 19 (1): 412–41.
Gu, Xinran, Kaifeng Lyu, Longbo Huang, and Sanjeev Arora. 2022.
“Why (and When) Does Local SGD Generalize Better Than SGD?” In.
Hinze, Annika, Jörgen Lantz, Sharon R. Hill, and Rickard Ignell. 2021.
“Mosquito Host Seeking in 3D Using a Versatile Climate-Controlled Wind Tunnel System.” Frontiers in Behavioral Neuroscience 15 (March): 643693.
Hochreiter, Sepp, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber. 2001.
“Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies.” In
A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press.
Li, Qianxiao, Cheng Tai, and E. Weinan. 2019.
“Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations.” In
The Journal of Machine Learning Research, 20:1474–1520.
Li, Zhiyuan, Sadhika Malladi, and Sanjeev Arora. 2021.
“On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs).” In
Advances in Neural Information Processing Systems, 34:12712–25. Curran Associates, Inc.
Ljung, Lennart, Georg Pflug, and Harro Walk. 1992.
Stochastic Approximation and Optimization of Random Systems. Basel: Birkhäuser.
Malladi, Sadhika, Kaifeng Lyu, Abhishek Panigrahi, and Sanjeev Arora. 2022.
“On the SDEs and Scaling Rules for Adaptive Gradient Algorithms.” In
Advances in Neural Information Processing Systems, 35:7697–7711.
Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017.
“Stochastic Gradient Descent as Approximate Bayesian Inference.” JMLR, April.
Schillings, Claudia, and Andrew M. Stuart. 2017.
“Analysis of the Ensemble Kalman Filter for Inverse Problems.” SIAM Journal on Numerical Analysis 55 (3): 1264–90.
Wang, Runzhe, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, and Zhiyuan Li. 2023.
“The Marginal Value of Momentum for Small Learning Rate SGD.” arXiv.
No comments yet. Why not leave one?