Agarwal, Alekh, Olivier Chapelle, Miroslav DudΔ±k, and John Langford. 2014.
βA Reliable Effective Terascale Linear Learning System.β Journal of Machine Learning Research 15 (1): 1111β33.
Allen-Zhu, Zeyuan, and Elad Hazan. 2016.
βOptimal Black-Box Reductions Between Optimization Objectives.β In
Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1606β14. Curran Associates, Inc.
Allen-Zhu, Zeyuan, David Simchi-Levi, and Xinshang Wang. 2019.
βThe Lingering of Gradients: How to Reuse Gradients over Time.β arXiv:1901.02871 [Cs, Math, Stat], January.
Andersson, Joel A. E., Joris Gillis, Greg Horn, James B. Rawlings, and Moritz Diehl. 2019.
βCasADi: A Software Framework for Nonlinear Optimization and Optimal Control.β Mathematical Programming Computation 11 (1): 1β36.
Aspremont, Alexandre dβ, Damien Scieur, and Adrien Taylor. 2021.
βAcceleration Methods.β arXiv:2101.09545 [Cs, Math], January.
Beck, Amir, and Marc Teboulle. 2003.
βMirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization.β Operations Research Letters 31 (3): 167β75.
Betancourt, Michael, Michael I. Jordan, and Ashia C. Wilson. 2018.
βOn Symplectic Optimization.β arXiv:1802.03653 [Stat], February.
Botev, Aleksandar, Guy Lever, and David Barber. 2016.
βNesterovβs Accelerated Gradient and Momentum as Approximations to Regularised Update Descent.β arXiv:1607.01981 [Cs, Stat], July.
Bubeck, SΓ©bastien. 2015.
Convex Optimization: Algorithms and Complexity. Vol. 8. Foundations and Trends in Machine Learning. Now Publishers.
Chen, Xiaojun. 2012.
βSmoothing Methods for Nonsmooth, Nonconvex Minimization.β Mathematical Programming 134 (1): 71β99.
Choromanska, Anna, MIkael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. 2015.
βThe Loss Surfaces of Multilayer Networks.β In
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 192β204.
Defazio, Aaron, Francis Bach, and Simon Lacoste-Julien. 2014.
βSAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives.β In
Advances in Neural Information Processing Systems 27.
DeVore, Ronald A. 1998.
βNonlinear Approximation.β Acta Numerica 7 (January): 51β150.
Goh, Gabriel. 2017.
βWhy Momentum Really Works.β Distill 2 (4): e6.
Hinton, Geoffrey, Nitish Srivastava, and Kevin Swersky. n.d. βNeural Networks for Machine Learning.β
Jakovetic, D., J.M. Freitas Xavier, and J.M.F. Moura. 2014.
βConvergence Rates of Distributed Nesterov-Like Gradient Methods on Random Networks.β IEEE Transactions on Signal Processing 62 (4): 868β82.
Langford, John, Lihong Li, and Tong Zhang. 2009.
βSparse Online Learning via Truncated Gradient.β In
Advances in Neural Information Processing Systems 21, edited by D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, 905β12. Curran Associates, Inc.
Lee, Jason D., Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2017.
βFirst-Order Methods Almost Always Avoid Saddle Points.β arXiv:1710.07406 [Cs, Math, Stat], October.
Lee, Jason D., Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2016.
βGradient Descent Converges to Minimizers.β arXiv:1602.04915 [Cs, Math, Stat], March.
Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017.
βStochastic Gradient Descent as Approximate Bayesian Inference.β JMLR, April.
Nesterov, Yu. 2012.
βGradient Methods for Minimizing Composite Functions.β Mathematical Programming 140 (1): 125β61.
Nocedal, Jorge, and S. Wright. 2006.
Numerical Optimization. 2nd ed. Springer Series in Operations Research and Financial Engineering. New York: Springer-Verlag.
Richards, Dominic, and Mike Rabbat. 2021.
βLearning with Gradient Descent and Weakly Convex Losses.β arXiv:2101.04968 [Cs, Math, Stat], June.
Ruder, Sebastian. 2016.
βAn Overview of Gradient Descent Optimization Algorithms.β arXiv:1609.04747 [Cs], September.
Sagun, Levent, V. Ugur Guney, Gerard Ben Arous, and Yann LeCun. 2014.
βExplorations on High Dimensional Landscapes.β arXiv:1412.6615 [Cs, Stat], December.
Wainwright, Martin J. 2014.
βStructured Regularizers for High-Dimensional Problems: Statistical and Computational Issues.β Annual Review of Statistics and Its Application 1 (1): 233β53.
Wibisono, Andre, and Ashia C. Wilson. 2015.
βOn Accelerated Methods in Optimization.β arXiv:1509.03616 [Math], September.
Wibisono, Andre, Ashia C. Wilson, and Michael I. Jordan. 2016.
βA Variational Perspective on Accelerated Methods in Optimization.β Proceedings of the National Academy of Sciences 113 (47): E7351β58.
Wright, Stephen J., and Benjamin Recht. 2021.
Optimization for Data Analysis. New York: Cambridge University Press.
Zinkevich, Martin. 2003.
βOnline Convex Programming and Generalized Infinitesimal Gradient Ascent.β In
Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 928β35. ICMLβ03. Washington, DC, USA: AAAI Press.
No comments yet. Why not leave one?