Agarwal, Alekh, Olivier Chapelle, Miroslav Dudık, and John Langford. 2014. “A Reliable Effective Terascale Linear Learning System.” Journal of Machine Learning Research
15 (1): 1111–33.
Allen-Zhu, Zeyuan, and Elad Hazan. 2016. “Optimal Black-Box Reductions Between Optimization Objectives.”
In Advances in Neural Information Processing Systems 29
, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1606–14. Curran Associates, Inc.
Allen-Zhu, Zeyuan, David Simchi-Levi, and Xinshang Wang. 2019. “The Lingering of Gradients: How to Reuse Gradients over Time.” arXiv:1901.02871 [Cs, Math, Stat]
Andersson, Joel A. E., Joris Gillis, Greg Horn, James B. Rawlings, and Moritz Diehl. 2019. “CasADi: A Software Framework for Nonlinear Optimization and Optimal Control.” Mathematical Programming Computation
11 (1): 1–36.
Aspremont, Alexandre d’, Damien Scieur, and Adrien Taylor. 2021. “Acceleration Methods.” arXiv:2101.09545 [Cs, Math]
Beck, Amir, and Marc Teboulle. 2003. “Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization.” Operations Research Letters
31 (3): 167–75.
Betancourt, Michael, Michael I. Jordan, and Ashia C. Wilson. 2018. “On Symplectic Optimization.” arXiv:1802.03653 [Stat]
Botev, Aleksandar, Guy Lever, and David Barber. 2016. “Nesterov’s Accelerated Gradient and Momentum as Approximations to Regularised Update Descent.” arXiv:1607.01981 [Cs, Stat]
Bubeck, Sébastien. 2015. Convex Optimization: Algorithms and Complexity
. Vol. 8. Foundations and Trends in Machine Learning. Now Publishers.
Chen, Xiaojun. 2012. “Smoothing Methods for Nonsmooth, Nonconvex Minimization.” Mathematical Programming
134 (1): 71–99.
Choromanska, Anna, MIkael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. 2015. “The Loss Surfaces of Multilayer Networks.”
In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
Defazio, Aaron, Francis Bach, and Simon Lacoste-Julien. 2014. “SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives.”
In Advances in Neural Information Processing Systems 27
DeVore, Ronald A. 1998. “Nonlinear Approximation.” Acta Numerica
7 (January): 51–150.
Goh, Gabriel. 2017. “Why Momentum Really Works.” Distill
2 (4): e6.
Hinton, Geoffrey, Nitish Srivastava, and Kevin Swersky. n.d. “Neural Networks for Machine Learning.”
Jakovetic, D., J.M. Freitas Xavier, and J.M.F. Moura. 2014. “Convergence Rates of Distributed Nesterov-Like Gradient Methods on Random Networks.” IEEE Transactions on Signal Processing
62 (4): 868–82.
Langford, John, Lihong Li, and Tong Zhang. 2009. “Sparse Online Learning via Truncated Gradient.”
In Advances in Neural Information Processing Systems 21
, edited by D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, 905–12. Curran Associates, Inc.
Lee, Jason D., Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2017. “First-Order Methods Almost Always Avoid Saddle Points.” arXiv:1710.07406 [Cs, Math, Stat]
Lee, Jason D., Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2016. “Gradient Descent Converges to Minimizers.” arXiv:1602.04915 [Cs, Math, Stat]
Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017. “Stochastic Gradient Descent as Approximate Bayesian Inference.” JMLR
Nesterov, Yu. 2012. “Gradient Methods for Minimizing Composite Functions.” Mathematical Programming
140 (1): 125–61.
Nocedal, Jorge, and S. Wright. 2006. Numerical Optimization
. 2nd ed. Springer Series in Operations Research and Financial Engineering. New York: Springer-Verlag.
Richards, Dominic, and Mike Rabbat. 2021. “Learning with Gradient Descent and Weakly Convex Losses.” arXiv:2101.04968 [Cs, Math, Stat]
Ruder, Sebastian. 2016. “An Overview of Gradient Descent Optimization Algorithms.” arXiv:1609.04747 [Cs]
Sagun, Levent, V. Ugur Guney, Gerard Ben Arous, and Yann LeCun. 2014. “Explorations on High Dimensional Landscapes.” arXiv:1412.6615 [Cs, Stat]
Wainwright, Martin J. 2014. “Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues.” Annual Review of Statistics and Its Application
1 (1): 233–53.
Wibisono, Andre, and Ashia C. Wilson. 2015. “On Accelerated Methods in Optimization.” arXiv:1509.03616 [Math]
Wibisono, Andre, Ashia C. Wilson, and Michael I. Jordan. 2016. “A Variational Perspective on Accelerated Methods in Optimization.” Proceedings of the National Academy of Sciences
113 (47): E7351–58.
Wright, Stephen J., and Benjamin Recht. 2021. Optimization for Data Analysis
. New York: Cambridge University Press.
Zinkevich, Martin. 2003. “Online Convex Programming and Generalized Infinitesimal Gradient Ascent.”
In Proceedings of the Twentieth International Conference on International Conference on Machine Learning
, 928–35. ICML’03. Washington, DC, USA: AAAI Press.