Agarwal, Alekh, Olivier Chapelle, Miroslav Dudık, and John Langford. 2014.
“A Reliable Effective Terascale Linear Learning System.” Journal of Machine Learning Research 15 (1): 1111–33.
http://www.jmlr.org/papers/volume15/agarwal14a/agarwal14a.pdf.
Allen-Zhu, Zeyuan, and Elad Hazan. 2016.
“Optimal Black-Box Reductions Between Optimization Objectives.” In
Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1606–14.
Curran Associates, Inc. http://papers.nips.cc/paper/6364-optimal-black-box-reductions-between-optimization-objectives.pdf.
Allen-Zhu, Zeyuan, David Simchi-Levi, and Xinshang Wang. 2019.
“The Lingering of Gradients: How to Reuse Gradients over Time.” January 9, 2019.
http://arxiv.org/abs/1901.02871.
Andersson, Joel A. E., Joris Gillis, Greg Horn, James B. Rawlings, and Moritz Diehl. 2019.
“CasADi: A Software Framework for Nonlinear Optimization and Optimal Control.” Mathematical Programming Computation 11 (1): 1–36.
https://doi.org/10.1007/s12532-018-0139-4.
Beck, Amir, and Marc Teboulle. 2003.
“Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization.” Operations Research Letters 31 (3): 167–75.
https://doi.org/10.1016/S0167-6377(02)00231-6.
———. 2009.
“A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems.” SIAM Journal on Imaging Sciences 2 (1): 183–202.
https://doi.org/10.1137/080716542.
Betancourt, Michael, Michael I. Jordan, and Ashia C. Wilson. 2018.
“On Symplectic Optimization.” February 10, 2018.
http://arxiv.org/abs/1802.03653.
Botev, Aleksandar, Guy Lever, and David Barber. 2016.
“Nesterov’s Accelerated Gradient and Momentum as Approximations to Regularised Update Descent.” July 7, 2016.
http://arxiv.org/abs/1607.01981.
Bubeck, Sébastien. 2015.
Convex Optimization: Algorithms and Complexity. Vol. 8. Foundations and
Trends in
Machine Learning.
Now Publishers.
https://doi.org/10.1561/2200000050.
Chen, Xiaojun. 2012.
“Smoothing Methods for Nonsmooth, Nonconvex Minimization.” Mathematical Programming 134 (1): 71–99.
https://doi.org/10.1007/s10107-012-0569-0.
Choromanska, Anna, MIkael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. 2015.
“The Loss Surfaces of Multilayer Networks.” In
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 192–204.
http://proceedings.mlr.press/v38/choromanska15.html.
Defazio, Aaron, Francis Bach, and Simon Lacoste-Julien. 2014.
“SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives.” In
Advances in Neural Information Processing Systems 27.
http://arxiv.org/abs/1407.0202.
DeVore, Ronald A. 1998.
“Nonlinear Approximation.” Acta Numerica 7 (January): 51–150.
https://doi.org/10.1017/S0962492900002816.
Domingos, Pedro. 2020.
“Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.” November 30, 2020.
http://arxiv.org/abs/2012.00152.
Hinton, Geoffrey, Nitish Srivastava, and Kevin Swersky. n.d. “Neural Networks for Machine Learning.”
Jakovetic, D., J. M. Freitas Xavier, and J. M. F. Moura. 2014.
“Convergence Rates of Distributed Nesterov-Like Gradient Methods on Random Networks.” IEEE Transactions on Signal Processing 62 (4): 868–82.
https://doi.org/10.1109/TSP.2013.2291221.
Langford, John, Lihong Li, and Tong Zhang. 2009.
“Sparse Online Learning via Truncated Gradient.” In
Advances in Neural Information Processing Systems 21, edited by D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, 905–12.
Curran Associates, Inc. http://papers.nips.cc/paper/3585-sparse-online-learning-via-truncated-gradient.pdf.
Lee, Jason D., Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2017.
“First-Order Methods Almost Always Avoid Saddle Points.” October 19, 2017.
http://arxiv.org/abs/1710.07406.
Lee, Jason D., Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2016.
“Gradient Descent Converges to Minimizers.” March 4, 2016.
http://arxiv.org/abs/1602.04915.
Ma, Siyuan, and Mikhail Belkin. 2017.
“Diving into the Shallows: A Computational Perspective on Large-Scale Shallow Learning.” March 30, 2017.
http://arxiv.org/abs/1703.10622.
Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017.
“Stochastic Gradient Descent as Approximate Bayesian Inference.” JMLR, April.
http://arxiv.org/abs/1704.04289.
Nesterov, Y. 2012.
“Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems.” SIAM Journal on Optimization 22 (2): 341–62.
https://doi.org/10.1137/100802001.
Nesterov, Yu. 2012.
“Gradient Methods for Minimizing Composite Functions.” Mathematical Programming 140 (1): 125–61.
https://doi.org/10.1007/s10107-012-0629-5.
Nocedal, Jorge, and S. Wright. 2006.
Numerical Optimization. 2nd ed. Springer
Series in
Operations Research and
Financial Engineering.
New York:
Springer-Verlag.
https://www.springer.com/gp/book/9780387303031.
Ruder, Sebastian. 2016.
“An Overview of Gradient Descent Optimization Algorithms.” September 15, 2016.
http://arxiv.org/abs/1609.04747.
Sagun, Levent, V. Ugur Guney, Gerard Ben Arous, and Yann LeCun. 2014.
“Explorations on High Dimensional Landscapes.” December 20, 2014.
http://arxiv.org/abs/1412.6615.
Wainwright, Martin J. 2014.
“Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues.” Annual Review of Statistics and Its Application 1 (1): 233–53.
https://doi.org/10.1146/annurev-statistics-022513-115643.
Wibisono, Andre, and Ashia C. Wilson. 2015.
“On Accelerated Methods in Optimization.” September 11, 2015.
http://arxiv.org/abs/1509.03616.
Wibisono, Andre, Ashia C. Wilson, and Michael I. Jordan. 2016.
“A Variational Perspective on Accelerated Methods in Optimization.” Proceedings of the National Academy of Sciences 113 (47): E7351–58.
https://doi.org/10.1073/pnas.1614734113.
Zinkevich, Martin. 2003.
“Online Convex Programming and Generalized Infinitesimal Gradient Ascent.” In
Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 928–35.
ICML’03.
Washington, DC, USA:
AAAI Press.
http://dl.acm.org/citation.cfm?id=3041838.3041955.