Ahn, Sungjin, Anoop Korattikara, and Max Welling. 2012.
โBayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.โ In
Proceedings of the 29th International Coference on International Conference on Machine Learning, 1771โ78. ICMLโ12. Madison, WI, USA: Omnipress.
Alexos, Antonios, Alex J. Boyd, and Stephan Mandt. 2022.
โStructured Stochastic Gradient MCMC.โ In
Proceedings of the 39th International Conference on Machine Learning, 414โ34. PMLR.
Bissiri, P. G., C. C. Holmes, and S. G. Walker. 2016.
โA General Framework for Updating Belief Distributions.โ Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78 (5): 1103โ30.
Blundell, Charles, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015.
โWeight Uncertainty in Neural Networks.โ In
Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, 1613โ22. ICMLโ15. Lille, France: JMLR.org.
Bradley, Arwen V., Carlos A. Gomez-Uribe, and Manish Reddy Vuyyuru. 2022.
โShift-Curvature, SGD, and Generalization.โ Machine Learning: Science and Technology 3 (4): 045002.
Brosse, Nicolas, รric Moulines, and Alain Durmus. 2018.
โThe Promises and Pitfalls of Stochastic Gradient Langevin Dynamics.โ In
Proceedings of the 32nd International Conference on Neural Information Processing Systems, 8278โ88. NIPSโ18. Red Hook, NY, USA: Curran Associates Inc.
Chada, Neil, and Xin Tong. 2022.
โConvergence Acceleration of Ensemble Kalman Inversion in Nonlinear Settings.โ Mathematics of Computation 91 (335): 1247โ80.
Chandramoorthy, Nisha, Andreas Loukas, Khashayar Gatmiry, and Stefanie Jegelka. 2022.
โOn the Generalization of Learning Algorithms That Do Not Converge.โ arXiv.
Chaudhari, Pratik, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, and Riccardo Zecchina. 2017.
โEntropy-SGD: Biasing Gradient Descent Into Wide Valleys.โ arXiv.
Chaudhari, Pratik, and Stefano Soatto. 2018.
โStochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks.โ In
2018 Information Theory and Applications Workshop (ITA), 1โ10.
Chen, Tianqi, Emily Fox, and Carlos Guestrin. 2014.
โStochastic Gradient Hamiltonian Monte Carlo.โ In
Proceedings of the 31st International Conference on Machine Learning, 1683โ91. Beijing, China: PMLR.
Choi, Hyunsun, Eric Jang, and Alexander A. Alemi. 2019.
โWAIC, but Why? Generative Ensembles for Robust Anomaly Detection.โ arXiv.
Detommaso, Gianluca, Tiangang Cui, Alessio Spantini, Youssef Marzouk, and Robert Scheichl. 2018.
โA Stein Variational Newton Method.โ In
Proceedings of the 32nd International Conference on Neural Information Processing Systems, 9187โ97. NIPSโ18. Red Hook, NY, USA: Curran Associates Inc.
Ding, Nan, Youhan Fang, Ryan Babbush, Changyou Chen, Robert D. Skeel, and Hartmut Neven. 2014.
โBayesian Sampling Using Stochastic Gradient Thermostats.โ In
Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, 3203โ11. NIPSโ14. Cambridge, MA, USA: MIT Press.
Donsker, M. D., and S. R. S. Varadhan. 1975.
โAsymptotic Evaluation of Certain Markov Process Expectations for Large Time, I.โ Communications on Pure and Applied Mathematics 28 (1): 1โ47.
Durmus, Alain, and Eric Moulines. 2016.
โHigh-Dimensional Bayesian Inference via the Unadjusted Langevin Algorithm.โ arXiv:1605.01559 [Math, Stat], May.
Dutordoir, Vincent, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, and Nicolas Durrande. 2021.
โDeep Neural Networks as Point Estimates for Deep Gaussian Processes.โ In
arXiv:2105.04504 [Cs, Stat].
Feng, Yu, and Yuhai Tu. 2021.
โThe Inverse VarianceโFlatness Relation in Stochastic Gradient Descent Is Critical for Finding Flat Minima.โ Proceedings of the National Academy of Sciences 118 (9): e2015617118.
Futami, Futoshi, Issei Sato, and Masashi Sugiyama. 2017.
โVariational Inference Based on Robust Divergences.โ arXiv:1710.06595 [Stat], October.
Ge, Rong, Holden Lee, and Andrej Risteski. 2020.
โSimulated Tempering Langevin Monte Carlo II: An Improved Proof Using Soft Markov Chain Decomposition.โ arXiv:1812.00793 [Cs, Math, Stat], September.
Girolami, Mark, and Ben Calderhead. 2011.
โRiemann Manifold Langevin and Hamiltonian Monte Carlo Methods.โ Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (2): 123โ214.
Goldt, Sebastian, and Udo Seifert. 2017.
โStochastic Thermodynamics of Learning.โ Physical Review Letters 118 (1): 010601.
Grenander, Ulf, and Michael I. Miller. 1994.
โRepresentations of Knowledge in Complex Systems.โ Journal of the Royal Statistical Society: Series B (Methodological) 56 (4): 549โ81.
Hodgkinson, Liam, Robert Salomone, and Fred Roosta. 2019.
โImplicit Langevin Algorithms for Sampling From Log-Concave Densities.โ arXiv:1903.12322 [Cs, Stat], March.
Immer, Alexander, Maciej Korzepa, and Matthias Bauer. 2021.
โImproving Predictions of Bayesian Neural Nets via Local Linearization.โ In
International Conference on Artificial Intelligence and Statistics, 703โ11. PMLR.
Izmailov, Pavel, Wesley J. Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2020.
โSubspace Inference for Bayesian Deep Learning.โ In
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, 1169โ79. PMLR.
Izmailov, Pavel, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018.
โAveraging Weights Leads to Wider Optima and Better Generalization,โ March.
Khan, Mohammad Emtiyaz, Alexander Immer, Ehsan Abedi, and Maciej Korzepa. 2020.
โApproximate Inference Turns Deep Networks into Gaussian Processes.โ arXiv:1906.01930 [Cs, Stat], July.
Khan, Mohammad Emtiyaz, and Hรฅvard Rue. 2022.
โThe Bayesian Learning Rule.โ arXiv.
Khan, Mohammad, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, and Akash Srivastava. 2018.
โFast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam.โ In
Proceedings of the 35th International Conference on Machine Learning, 2611โ20. PMLR.
Knoblauch, Jeremias, Jack Jewson, and Theodoros Damoulas. 2022. โAn Optimization-Centric View on Bayesโ Rule: Reviewing and Generalizing Variational Inference.โ Journal of Machine Learning Research 23 (132): 1โ109.
Kristiadi, Agustinus, Matthias Hein, and Philipp Hennig. 2021.
โLearnable Uncertainty Under Laplace Approximations.โ In
Uncertainty in Artificial Intelligence.
Liu, Qiang, and Dilin Wang. 2019.
โStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.โ In
Advances In Neural Information Processing Systems.
Ma, Yi-An, Tianqi Chen, and Emily B. Fox. 2015.
โA Complete Recipe for Stochastic Gradient MCMC.โ In
Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, 2917โ25. NIPSโ15. Cambridge, MA, USA: MIT Press.
Maclaurin, Dougal, David Duvenaud, and Ryan P. Adams. 2015.
โEarly Stopping as Nonparametric Variational Inference.โ In
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 1070โ77. arXiv.
Maddox, Wesley, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, and Andrew Gordon Wilson. 2019.
โA Simple Baseline for Bayesian Uncertainty in Deep Learning,โ February.
Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017.
โStochastic Gradient Descent as Approximate Bayesian Inference.โ JMLR, April.
Martens, James. 2020.
โNew Insights and Perspectives on the Natural Gradient Method.โ Journal of Machine Learning Research 21 (146): 1โ76.
Neal, Radford M. 1996.
โBayesian Learning for Neural Networks.โ Secaucus, NJ, USA: Springer-Verlag New York, Inc.
Norton, Richard A., and Colin Fox. 2016.
โTuning of MCMC with Langevin, Hamiltonian, and Other Stochastic Autoregressive Proposals.โ arXiv:1610.00781 [Math, Stat], October.
Osawa, Kazuki, Siddharth Swaroop, Mohammad Emtiyaz E Khan, Anirudh Jain, Runa Eschenhagen, Richard E Turner, and Rio Yokota. 2019.
โPractical Deep Learning with Bayesian Principles.โ In
Advances in Neural Information Processing Systems. Vol. 32. Red Hook, NY, USA: Curran Associates, Inc.
Parisi, G. 1981.
โCorrelation Functions and Computer Simulations.โ Nuclear Physics B 180 (3): 378โ84.
Rรกsonyi, Miklรณs, and Kinga Tikosi. 2022.
โOn the Stability of the Stochastic Gradient Langevin Algorithm with Dependent Data Stream.โ Statistics & Probability Letters 182 (March): 109321.
Ritter, Hippolyt, Martin Kukla, Cheng Zhang, and Yingzhen Li. 2021.
โSparse Uncertainty Representation in Deep Learning with Inducing Weights.โ arXiv:2105.14594 [Cs, Stat], May.
Shang, Xiaocheng, Zhanxing Zhu, Benedict Leimkuhler, and Amos J Storkey. 2015.
โCovariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling.โ In
Advances in Neural Information Processing Systems. Vol. 28. NIPSโ15. Curran Associates, Inc.
Smith, Samuel L., Benoit Dherin, David Barrett, and Soham De. 2020.
โOn the Origin of Implicit Regularization in Stochastic Gradient Descent.โ In.
Sun, Jianhui, Ying Yang, Guangxu Xun, and Aidong Zhang. 2023.
โScheduling Hyperparameters to Improve Generalization: From Centralized SGD to Asynchronous SGD.โ ACM Transactions on Knowledge Discovery from Data 17 (2): 29:1โ37.
Wainwright, Martin J., and Michael I. Jordan. 2008.
Graphical Models, Exponential Families, and Variational Inference. Vol. 1. Foundations and Trendsยฎ in Machine Learning. Now Publishers.
Welling, Max, and Yee Whye Teh. 2011.
โBayesian Learning via Stochastic Gradient Langevin Dynamics.โ In
Proceedings of the 28th International Conference on International Conference on Machine Learning, 681โ88. ICMLโ11. Madison, WI, USA: Omnipress.
Wenzel, Florian, Kevin Roth, Bastiaan Veeling, Jakub Swiatkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, and Sebastian Nowozin. 2020.
โHow Good Is the Bayes Posterior in Deep Neural Networks Really?โ In
Proceedings of the 37th International Conference on Machine Learning, 119:10248โ59. PMLR.
Xifara, T., C. Sherlock, S. Livingstone, S. Byrne, and M. Girolami. 2014.
โLangevin Diffusions and the Metropolis-Adjusted Langevin Algorithm.โ Statistics & Probability Letters 91 (Supplement C): 14โ19.
Zellner, Arnold. 1988.
โOptimal Information Processing and Bayesโs Theorem.โ The American Statistician 42 (4): 278โ80.
Zhang, Guodong, Shengyang Sun, David Duvenaud, and Roger Grosse. 2018.
โNoisy Natural Gradient as Variational Inference.โ In
Proceedings of the 35th International Conference on Machine Learning, 5852โ61. PMLR.
Zhang, Tong. 1999.
โTheoretical Analysis of a Class of Randomized Regularization Methods.โ In
Proceedings of the Twelfth Annual Conference on Computational Learning Theory, 156โ63. COLT โ99. New York, NY, USA: Association for Computing Machinery.
Zhang, Yao, Andrew M. Saxe, Madhu S. Advani, and Alpha A. Lee. 2018.
โEnergy-Entropy Competition and the Effectiveness of Stochastic Gradient Descent in Machine Learning.โ Molecular Physics 116 (21-22): 3214โ23.
No comments yet. Why not leave one?