Ahn, Sungjin, Anoop Korattikara, and Max Welling. 2012.
“Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.” In
Proceedings of the 29th International Coference on International Conference on Machine Learning, 1771–78. ICML’12. Madison, WI, USA: Omnipress.
Alexos, Antonios, Alex J. Boyd, and Stephan Mandt. 2022.
“Structured Stochastic Gradient MCMC.” In
Proceedings of the 39th International Conference on Machine Learning, 414–34. PMLR.
Arya, Gaurav, Moritz Schauer, Frank Schäfer, and Christopher Vincent Rackauckas. 2022.
“Automatic Differentiation of Programs with Discrete Randomness.” In.
Bach, Francis R., and Eric Moulines. 2013.
“Non-Strongly-Convex Smooth Stochastic Approximation with Convergence Rate O(1/n).” In
arXiv:1306.2119 [Cs, Math, Stat], 773–81.
Bach, Francis, and Eric Moulines. 2011.
“Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning.” In
Advances in Neural Information Processing Systems (NIPS), –. Spain.
Benaïm, Michel. 1999.
“Dynamics of Stochastic Approximation Algorithms.” In
Séminaire de Probabilités de Strasbourg, 33:1–68. Lecture Notes in Math. Berlin: Springer, Berlin.
Bensoussan, Alain, Yiqun Li, Dinh Phan Cao Nguyen, Minh-Binh Tran, Sheung Chi Phillip Yam, and Xiang Zhou. 2020.
“Machine Learning and Control Theory.” arXiv:2006.05604 [Cs, Math, Stat], June.
Botev, Zdravko I., and Chris J. Lloyd. 2015.
“Importance Accelerated Robbins-Monro Recursion with Applications to Parametric Confidence Limits.” Electronic Journal of Statistics 9 (2): 2058–75.
Bottou, Léon. 1991.
“Stochastic Gradient Learning in Neural Networks.” In
Proceedings of Neuro-Nîmes 91. Nimes, France: EC2.
———. 1998.
“Online Algorithms and Stochastic Approximations.” In
Online Learning and Neural Networks, edited by David Saad, 17:142. Cambridge, UK: Cambridge University Press.
———. 2010.
“Large-Scale Machine Learning with Stochastic Gradient Descent.” In
Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010), 177–86. Paris, France: Springer.
Bottou, Léon, and Olivier Bousquet. 2008.
“The Tradeoffs of Large Scale Learning.” In
Advances in Neural Information Processing Systems, edited by J.C. Platt, D. Koller, Y. Singer, and S. Roweis, 20:161–68. NIPS Foundation (http://books.nips.cc).
Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. 2016.
“Optimization Methods for Large-Scale Machine Learning.” arXiv:1606.04838 [Cs, Math, Stat], June.
Bottou, Léon, and Yann LeCun. 2004.
“Large Scale Online Learning.” In
Advances in Neural Information Processing Systems 16, edited by Sebastian Thrun, Lawrence Saul, and Bernhard Schölkopf. Cambridge, MA: MIT Press.
Bubeck, Sébastien. 2015.
Convex Optimization: Algorithms and Complexity. Vol. 8. Foundations and Trends in Machine Learning. Now Publishers.
Cevher, Volkan, Stephen Becker, and Mark Schmidt. 2014.
“Convex Optimization for Big Data.” IEEE Signal Processing Magazine 31 (5): 32–43.
Chaudhari, Pratik, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, and Riccardo Zecchina. 2017.
“Entropy-SGD: Biasing Gradient Descent Into Wide Valleys.” arXiv.
Chen, Tianqi, Emily Fox, and Carlos Guestrin. 2014.
“Stochastic Gradient Hamiltonian Monte Carlo.” In
Proceedings of the 31st International Conference on Machine Learning, 1683–91. Beijing, China: PMLR.
Chen, Xiaojun. 2012.
“Smoothing Methods for Nonsmooth, Nonconvex Minimization.” Mathematical Programming 134 (1): 71–99.
Di Giovanni, Francesco, James Rowbottom, Benjamin P. Chamberlain, Thomas Markovich, and Michael M. Bronstein. 2022.
“Graph Neural Networks as Gradient Flows.” arXiv.
Duchi, John, Elad Hazan, and Yoram Singer. 2011.
“Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” Journal of Machine Learning Research 12 (Jul): 2121–59.
Friedlander, Michael P., and Mark Schmidt. 2012.
“Hybrid Deterministic-Stochastic Methods for Data Fitting.” SIAM Journal on Scientific Computing 34 (3): A1380–1405.
Ghadimi, Saeed, and Guanghui Lan. 2013a.
“Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming.” SIAM Journal on Optimization 23 (4): 2341–68.
Goh, Gabriel. 2017.
“Why Momentum Really Works.” Distill 2 (4): e6.
Hazan, Elad, Kfir Levy, and Shai Shalev-Shwartz. 2015.
“Beyond Convexity: Stochastic Quasi-Convex Optimization.” In
Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 1594–1602. Curran Associates, Inc.
Hu, Chonghai, Weike Pan, and James T. Kwok. 2009.
“Accelerated Gradient Methods for Stochastic Optimization and Online Learning.” In
Advances in Neural Information Processing Systems, 781–89. Curran Associates, Inc.
Jakovetic, D., J.M. Freitas Xavier, and J.M.F. Moura. 2014.
“Convergence Rates of Distributed Nesterov-Like Gradient Methods on Random Networks.” IEEE Transactions on Signal Processing 62 (4): 868–82.
Kidambi, Rahul, Praneeth Netrapalli, Prateek Jain, and Sham M. Kakade. 2023.
“On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization.” In.
Kingma, Diederik, and Jimmy Ba. 2015.
“Adam: A Method for Stochastic Optimization.” Proceeding of ICLR.
Lai, Tze Leung. 2003.
“Stochastic Approximation.” The Annals of Statistics 31 (2): 391–406.
Lee, Jason D., Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2017.
“First-Order Methods Almost Always Avoid Saddle Points.” arXiv:1710.07406 [Cs, Math, Stat], October.
Lee, Jason D., Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2016.
“Gradient Descent Converges to Minimizers.” arXiv:1602.04915 [Cs, Math, Stat], March.
Liu, Qiang, and Dilin Wang. 2019.
“Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” In
Advances In Neural Information Processing Systems.
Ljung, Lennart, Georg Pflug, and Harro Walk. 1992.
Stochastic Approximation and Optimization of Random Systems. Basel: Birkhäuser.
Maclaurin, Dougal, David Duvenaud, and Ryan P. Adams. 2015.
“Early Stopping as Nonparametric Variational Inference.” In
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 1070–77. arXiv.
Mairal, Julien. 2013.
“Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization.” In
Advances in Neural Information Processing Systems, 2283–91.
Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017.
“Stochastic Gradient Descent as Approximate Bayesian Inference.” JMLR, April.
McMahan, H. Brendan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, et al. 2013.
“Ad Click Prediction: A View from the Trenches.” In
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1222–30. KDD ’13. New York, NY, USA: ACM.
Mitliagkas, Ioannis, Ce Zhang, Stefan Hadjis, and Christopher Ré. 2016.
“Asynchrony Begets Momentum, with an Application to Deep Learning.” arXiv:1605.09774 [Cs, Math, Stat], May.
Nguyen, Lam M., Jie Liu, Katya Scheinberg, and Martin Takáč. 2017.
“Stochastic Recursive Gradient Algorithm for Nonconvex Optimization.” arXiv:1705.07261 [Cs, Math, Stat], May.
Patel, Vivak. 2017.
“On SGD’s Failure in Practice: Characterizing and Overcoming Stalling.” arXiv:1702.00317 [Cs, Math, Stat], February.
Polyak, B. T., and A. B. Juditsky. 1992.
“Acceleration of Stochastic Approximation by Averaging.” SIAM Journal on Control and Optimization 30 (4): 838–55.
Reddi, Sashank J., Ahmed Hefny, Suvrit Sra, Barnabas Poczos, and Alex Smola. 2016.
“Stochastic Variance Reduction for Nonconvex Optimization.” In
PMLR, 1603:314–23.
Robbins, Herbert, and Sutton Monro. 1951.
“A Stochastic Approximation Method.” The Annals of Mathematical Statistics 22 (3): 400–407.
Robbins, H., and D. Siegmund. 1971.
“A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications.” In
Optimizing Methods in Statistics, edited by Jagdish S. Rustagi, 233–57. Academic Press.
Ruder, Sebastian. 2016.
“An Overview of Gradient Descent Optimization Algorithms.” arXiv:1609.04747 [Cs], September.
Sagun, Levent, V. Ugur Guney, Gerard Ben Arous, and Yann LeCun. 2014.
“Explorations on High Dimensional Landscapes.” arXiv:1412.6615 [Cs, Stat], December.
Salimans, Tim, and Diederik P Kingma. 2016.
“Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.” In
Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 901–1. Curran Associates, Inc.
Shalev-Shwartz, Shai, and Ambuj Tewari. 2011.
“Stochastic Methods for L1-Regularized Loss Minimization.” Journal of Machine Learning Research 12 (July): 1865–92.
Şimşekli, Umut, Ozan Sener, George Deligiannidis, and Murat A. Erdogdu. 2020.
“Hausdorff Dimension, Stochastic Differential Equations, and Generalization in Neural Networks.” CoRR abs/2006.09313.
Smith, Samuel L., Benoit Dherin, David Barrett, and Soham De. 2020.
“On the Origin of Implicit Regularization in Stochastic Gradient Descent.” In.
Spall, J. C. 2000.
“Adaptive Stochastic Approximation by the Simultaneous Perturbation Method.” IEEE Transactions on Automatic Control 45 (10): 1839–53.
Sun, Jianhui, Ying Yang, Guangxu Xun, and Aidong Zhang. 2023.
“Scheduling Hyperparameters to Improve Generalization: From Centralized SGD to Asynchronous SGD.” ACM Transactions on Knowledge Discovery from Data 17 (2): 29:1–37.
Vishwanathan, S.V. N., Nicol N. Schraudolph, Mark W. Schmidt, and Kevin P. Murphy. 2006. “Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods.” In Proceedings of the 23rd International Conference on Machine Learning.
Welling, Max, and Yee Whye Teh. 2011.
“Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In
Proceedings of the 28th International Conference on International Conference on Machine Learning, 681–88. ICML’11. Madison, WI, USA: Omnipress.
Wright, Stephen J., and Benjamin Recht. 2021.
Optimization for Data Analysis. New York: Cambridge University Press.
Zinkevich, Martin, Markus Weimer, Lihong Li, and Alex J. Smola. 2010.
“Parallelized Stochastic Gradient Descent.” In
Advances in Neural Information Processing Systems 23, edited by J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, 2595–2603. Curran Associates, Inc.
No comments yet. Why not leave one?