--- references: - id: AbtFisher1998 accessed: - year: 2019 month: 10 day: 2 author: - family: Abt given: Markus - family: Welch given: William J. citation-key: AbtFisher1998 container-title: Canadian Journal of Statistics DOI: 10.2307/3315678 ISSN: 1708-945X issue: '1' issued: - year: 1998 language: en page: 127-137 title: >- Fisher information and maximum-likelihood estimation of covariance parameters in Gaussian stochastic processes type: article-journal volume: '26' - id: AmariAdaptive2000 accessed: - year: 2019 month: 7 day: 19 author: - family: Amari given: Shun-ichi - family: Park given: Hyeyoung - family: Fukumizu given: Kenji citation-key: AmariAdaptive2000 container-title: Neural Computation container-title-short: Neural Computation DOI: 10.1162/089976600300015420 ISSN: 0899-7667 issue: '6' issued: - year: 2000 month: 6 day: 1 page: 1399-1409 PMID: '10935719' title: >- Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons type: article-journal URL: https://www.mitpressjournals.org/doi/10.1162/089976600300015420 volume: '12' - id: AmariFisher2018 accessed: - year: 2019 month: 7 day: 19 author: - family: Amari given: Shun-ichi - family: Karakida given: Ryo - family: Oizumi given: Masafumi citation-key: AmariFisher2018 container-title: arXiv:1808.07172 [cond-mat, stat] issued: - year: 2018 month: 8 day: 21 title: Fisher Information and Natural Gradient Learning of Random Deep Networks type: article-journal URL: http://arxiv.org/abs/1808.07172 - id: AmariNatural1998 accessed: - year: 2014 month: 8 day: 30 author: - family: Amari given: Shun-ichi citation-key: AmariNatural1998 container-title: Neural Computation container-title-short: Neural Computation DOI: 10.1162/089976698300017746 ISSN: 0899-7667 issue: '2' issued: - year: 1998 month: 2 day: 1 page: 251-276 title: Natural Gradient Works Efficiently in Learning type: article-journal volume: '10' - id: ArbelKernelized2020 accessed: - year: 2023 month: 6 day: 2 author: - family: Arbel given: Michael - family: Gretton given: Arthur - family: Li given: Wuchen - family: Montufar given: Guido citation-key: ArbelKernelized2020 DOI: 10.48550/arXiv.1910.09652 issued: - year: 2020 month: 2 day: 13 number: arXiv:1910.09652 publisher: arXiv title: Kernelized Wasserstein Natural Gradient type: article URL: http://arxiv.org/abs/1910.09652 - id: BotevPractical2017 accessed: - year: 2023 month: 1 day: 17 author: - family: Botev given: Aleksandar - family: Ritter given: Hippolyt - family: Barber given: David citation-key: BotevPractical2017 container-title: Proceedings of the 34th International Conference on Machine Learning event-title: International Conference on Machine Learning ISSN: 2640-3498 issued: - year: 2017 month: 7 day: 17 language: en page: 557-565 publisher: PMLR title: Practical Gauss-Newton Optimisation for Deep Learning type: paper-conference URL: http://arxiv.org/abs/1706.03662 - id: ChenStochastic2014 accessed: - year: 2022 month: 8 day: 16 author: - family: Chen given: Tianqi - family: Fox given: Emily - family: Guestrin given: Carlos citation-key: ChenStochastic2014 container-title: Proceedings of the 31st International Conference on Machine Learning DOI: 10.48550/arXiv.1402.4102 event-place: Beijing, China event-title: International Conference on Machine Learning ISSN: 1938-7228 issued: - year: 2014 month: 6 day: 18 language: en page: 1683-1691 publisher: PMLR publisher-place: Beijing, China title: Stochastic Gradient Hamiltonian Monte Carlo type: paper-conference URL: https://proceedings.mlr.press/v32/cheni14.html - id: CsiszarIDivergence1975 accessed: - year: 2020 month: 5 day: 9 author: - family: Csiszar given: I. citation-key: CsiszarIDivergence1975 container-title: Annals of Probability container-title-short: Ann. Probab. DOI: 10.1214/aop/1176996454 ISSN: 0091-1798, 2168-894X issue: '1' issued: - year: 1975 month: 2 language: EN page: 146-158 publisher: Institute of Mathematical Statistics title: I-Divergence Geometry of Probability Distributions and Minimization Problems type: article-journal volume: '3' - id: DurmusHighdimensional2016 accessed: - year: 2018 month: 7 day: 10 author: - family: Durmus given: Alain - family: Moulines given: Eric citation-key: DurmusHighdimensional2016 container-title: arXiv:1605.01559 [math, stat] issued: - year: 2016 month: 5 day: 5 title: High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm type: article-journal URL: http://arxiv.org/abs/1605.01559 - id: EfronAssessing1978 accessed: - year: 2015 month: 6 day: 24 author: - family: Efron given: Bradley - family: Hinkley given: David V. citation-key: EfronAssessing1978 container-title: Biometrika container-title-short: Biometrika DOI: 10.1093/biomet/65.3.457 ISSN: 0006-3444, 1464-3510 issue: '3' issued: - year: 1978 month: 1 day: 12 language: en page: 457-483 title: >- Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information type: article-journal URL: https://statistics.stanford.edu/sites/default/files/EFS%20NSF%20108.pdf volume: '65' - id: GeSimulated2020 accessed: - year: 2021 month: 3 day: 10 author: - family: Ge given: Rong - family: Lee given: Holden - family: Risteski given: Andrej citation-key: GeSimulated2020 container-title: arXiv:1812.00793 [cs, math, stat] issued: - year: 2020 month: 9 day: 9 title: >- Simulated Tempering Langevin Monte Carlo II: An Improved Proof using Soft Markov Chain Decomposition type: article-journal URL: http://arxiv.org/abs/1812.00793 - id: GrosseKroneckerfactored2016 accessed: - year: 2023 month: 1 day: 17 author: - family: Grosse given: Roger - family: Martens given: James citation-key: GrosseKroneckerfactored2016 container-title: Proceedings of The 33rd International Conference on Machine Learning event-title: International Conference on Machine Learning ISSN: 1938-7228 issued: - year: 2016 month: 6 day: 11 language: en page: 573-582 publisher: PMLR title: A Kronecker-factored approximate Fisher matrix for convolution layers type: paper-conference URL: https://proceedings.mlr.press/v48/grosse16.html - id: GrosseMetrics2021 accessed: - year: 2022 month: 7 day: 25 author: - family: Grosse given: Roger citation-key: GrosseMetrics2021 container-title: CSC2541 Winter 2021 issued: - year: 2021 page: Chapter 3 title: Metrics type: chapter URL: >- https://www.cs.toronto.edu/~rgrosse/courses/csc2541_2021/readings/L03_metrics.pdf - id: HensmanFast2012 accessed: - year: 2023 month: 6 day: 2 author: - family: Hensman given: James - family: Rattray given: Magnus - family: Lawrence given: Neil citation-key: HensmanFast2012 container-title: Advances in Neural Information Processing Systems issued: - year: 2012 publisher: Curran Associates, Inc. title: Fast Variational Inference in the Conjugate Exponential Family type: paper-conference URL: >- https://proceedings.neurips.cc/paper_files/paper/2012/hash/50905d7b2216bfeccb5b41016357176b-Abstract.html volume: '25' - id: HonkelaNatural2008 author: - family: Honkela given: Antti - family: Tornio given: Matti - family: Raiko given: Tapani - family: Karhunen given: Juha citation-key: HonkelaNatural2008 collection-title: Lecture Notes in Computer Science container-title: Neural Information Processing DOI: 10.1007/978-3-540-69162-4_32 editor: - family: Ishikawa given: Masumi - family: Doya given: Kenji - family: Miyamoto given: Hiroyuki - family: Yamakawa given: Takeshi event-place: Berlin, Heidelberg ISBN: 978-3-540-69162-4 ISSN: 0302-9743, 1611-3349 issued: - year: 2008 language: en page: 305-314 publisher: Springer publisher-place: Berlin, Heidelberg title: Natural Conjugate Gradient in Variational Inference type: paper-conference - id: KakadeNatural2002 author: - family: Kakade given: Sham M citation-key: KakadeNatural2002 container-title: Advances In Neural Information Processing Systems event-title: Neural Information Processing Systems issued: - year: 2002 language: en page: '8' title: A Natural Policy Gradient type: paper-conference - id: KarakidaUnderstanding2020 accessed: - year: 2020 month: 12 day: 11 author: - family: Karakida given: Ryo - family: Osawa given: Kazuki citation-key: KarakidaUnderstanding2020 container-title: Advances in Neural Information Processing Systems issued: - year: 2020 language: en title: >- Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks type: article-journal URL: >- https://proceedings.neurips.cc//paper_files/paper/2020/hash/7b41bfa5085806dfa24b8c9de0ce567f-Abstract.html volume: '33' - id: KhanBayesian2023 accessed: - year: 2024 month: 2 day: 8 author: - family: Khan given: Mohammad Emtiyaz - family: Rue given: Håvard citation-key: KhanBayesian2023 DOI: 10.48550/arXiv.2107.04562 issued: - year: 2023 month: 6 day: 30 number: arXiv:2107.04562 publisher: arXiv title: The Bayesian Learning Rule type: article URL: http://arxiv.org/abs/2107.04562 - id: KhanConjugateComputation2017 accessed: - year: 2020 month: 11 day: 24 author: - family: Khan given: Mohammad Emtiyaz - family: Lin given: Wu citation-key: KhanConjugateComputation2017 container-title: Artificial Intelligence and Statistics event-title: Artificial Intelligence and Statistics ISSN: 2640-3498 issued: - year: 2017 month: 4 day: 10 language: en page: 878-887 publisher: PMLR title: >- Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models type: paper-conference URL: http://arxiv.org/abs/1703.04265 - id: KhanFast2018 accessed: - year: 2023 month: 8 day: 28 author: - family: Khan given: Mohammad - family: Nielsen given: Didrik - family: Tangkaratt given: Voot - family: Lin given: Wu - family: Gal given: Yarin - family: Srivastava given: Akash citation-key: KhanFast2018 container-title: Proceedings of the 35th International Conference on Machine Learning event-title: International Conference on Machine Learning ISSN: 2640-3498 issued: - year: 2018 month: 7 day: 3 language: en page: 2611-2620 publisher: PMLR title: Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam type: paper-conference URL: https://proceedings.mlr.press/v80/khan18a.html - id: LyTutorial2017 accessed: - year: 2020 month: 5 day: 24 author: - family: Ly given: Alexander - family: Marsman given: Maarten - family: Verhagen given: Josine - family: Grasman given: Raoul P. P. P. - family: Wagenmakers given: Eric-Jan citation-key: LyTutorial2017 container-title: Journal of Mathematical Psychology container-title-short: Journal of Mathematical Psychology DOI: 10.1016/j.jmp.2017.05.006 ISSN: 0022-2496 issued: - year: 2017 month: 10 day: 1 language: en page: 40-55 title: A Tutorial on Fisher information type: article-journal URL: http://arxiv.org/abs/1705.01064 volume: '80' - id: MartensNew2020 accessed: - year: 2020 month: 9 day: 17 author: - family: Martens given: James citation-key: MartensNew2020 container-title: Journal of Machine Learning Research ISSN: 1533-7928 issue: '146' issued: - year: 2020 page: 1-76 title: New Insights and Perspectives on the Natural Gradient Method type: article-journal URL: http://jmlr.org/papers/v21/17-678.html volume: '21' - id: MartensOptimizing2015 accessed: - year: 2022 month: 1 day: 6 author: - family: Martens given: James - family: Grosse given: Roger citation-key: MartensOptimizing2015 container-title: Proceedings of the 32nd International Conference on Machine Learning event-title: International Conference on Machine Learning ISSN: 1938-7228 issued: - year: 2015 month: 6 day: 1 language: en page: 2408-2417 publisher: PMLR title: Optimizing Neural Networks with Kronecker-factored Approximate Curvature type: paper-conference URL: http://arxiv.org/abs/1503.05671 - id: MartensSECONDORDER2016 accessed: - year: 2020 month: 5 day: 26 author: - family: Martens given: James citation-key: MartensSECONDORDER2016 issued: - year: 2016 publisher: University of Toronto title: Second-Order Optimization for Neural Networks type: thesis URL: http://www.cs.toronto.edu/~jmartens/docs/thesis_phd_martens.pdf - id: MosegaardProbabilistic2002 accessed: - year: 2022 month: 1 day: 13 author: - family: Mosegaard given: Klaus - family: Tarantola given: Albert citation-key: MosegaardProbabilistic2002 container-title: International Geophysics DOI: 10.1016/S0074-6142(02)80219-4 ISBN: 978-0-12-440652-0 issued: - year: 2002 language: en page: 237-265 publisher: Elsevier title: Probabilistic approach to inverse problems type: chapter URL: https://linkinghub.elsevier.com/retrieve/pii/S0074614202802194 volume: '81' - id: NielsenElementary2018 accessed: - year: 2019 month: 12 day: 27 author: - family: Nielsen given: Frank citation-key: NielsenElementary2018 container-title: arXiv:1808.08271 [cs, math, stat] issued: - year: 2018 month: 8 day: 16 title: An elementary introduction to information geometry type: article-journal URL: http://arxiv.org/abs/1808.08271 - id: NortonTuning2016 accessed: - year: 2016 month: 10 day: 5 author: - family: Norton given: Richard A. - family: Fox given: Colin citation-key: NortonTuning2016 container-title: arXiv:1610.00781 [math, stat] issued: - year: 2016 month: 10 day: 3 title: >- Tuning of MCMC with Langevin, Hamiltonian, and other stochastic autoregressive proposals type: article-journal URL: http://arxiv.org/abs/1610.00781 - id: NurbekyanEfficient2022 accessed: - year: 2022 month: 7 day: 28 author: - family: Nurbekyan given: Levon - family: Lei given: Wanzhou - family: Yang given: Yunan citation-key: NurbekyanEfficient2022 DOI: 10.48550/arXiv.2202.06236 issued: - year: 2022 month: 4 day: 3 number: arXiv:2202.06236 publisher: arXiv title: >- Efficient Natural Gradient Descent Methods for Large-Scale Optimization Problems type: article URL: http://arxiv.org/abs/2202.06236 - id: OllivierOnline2017 accessed: - year: 2017 month: 7 day: 17 author: - family: Ollivier given: Yann citation-key: OllivierOnline2017 container-title: arXiv:1703.00209 [math, stat] issued: - year: 2017 month: 3 day: 1 title: Online Natural Gradient as a Kalman Filter type: article-journal URL: http://arxiv.org/abs/1703.00209 - id: OsawaASDL2023 accessed: - year: 2023 month: 6 day: 2 author: - family: Osawa given: Kazuki - family: Ishikawa given: Satoki - family: Yokota given: Rio - family: Li given: Shigang - family: Hoefler given: Torsten citation-key: OsawaASDL2023 issued: - year: 2023 month: 5 day: 8 language: en number: arXiv:2305.04684 publisher: arXiv title: 'ASDL: A Unified Interface for Gradient Preconditioning in PyTorch' type: article URL: http://arxiv.org/abs/2305.04684 - id: OsawaPractical2019 accessed: - year: 2022 month: 8 day: 16 author: - family: Osawa given: Kazuki - family: Swaroop given: Siddharth - family: Khan given: Mohammad Emtiyaz E - family: Jain given: Anirudh - family: Eschenhagen given: Runa - family: Turner given: Richard E - family: Yokota given: Rio citation-key: OsawaPractical2019 container-title: Advances in Neural Information Processing Systems event-place: Red Hook, NY, USA issued: - year: 2019 publisher: Curran Associates, Inc. publisher-place: Red Hook, NY, USA title: Practical Deep Learning with Bayesian Principles type: paper-conference URL: http://arxiv.org/abs/1906.02506 volume: '32' - id: SalimbeniNatural2018 accessed: - year: 2020 month: 5 day: 26 author: - family: Salimbeni given: Hugh - family: Eleftheriadis given: Stefanos - family: Hensman given: James citation-key: SalimbeniNatural2018 container-title: International Conference on Artificial Intelligence and Statistics event-title: International Conference on Artificial Intelligence and Statistics ISSN: 1938-7228 issued: - year: 2018 month: 3 day: 31 language: en page: 689-697 section: Machine Learning title: >- Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models type: paper-conference URL: http://arxiv.org/abs/1803.09151 - id: SatoOnline2001 accessed: - year: 2024 month: 2 day: 26 author: - family: Sato given: Masa-aki citation-key: SatoOnline2001 container-title: Neural Computation container-title-short: Neural Computation DOI: 10.1162/089976601750265045 ISSN: 0899-7667 issue: '7' issued: - year: 2001 month: 7 day: 1 page: 1649-1681 title: Online Model Selection Based on the Variational Bayes type: article-journal URL: >- https://web.archive.org/web/20130319090035id_/http://www.robots.ox.ac.uk:80/~parg/mlrg/papers/online_variational.pdf volume: '13' - id: SchraudolphFast2002 accessed: - year: 2018 month: 4 day: 2 author: - family: Schraudolph given: Nicol N. citation-key: SchraudolphFast2002 container-title: Neural Computation container-title-short: Neural Computation DOI: 10.1162/08997660260028683 ISSN: 0899-7667 issue: '7' issued: - year: 2002 month: 7 day: 1 page: 1723-1738 title: Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent type: article-journal URL: https://nic.schraudolph.org/pubs/Schraudolph02.pdf volume: '14' - id: WellingBayesian2011 accessed: - year: 2022 month: 8 day: 16 author: - family: Welling given: Max - family: Teh given: Yee Whye citation-key: WellingBayesian2011 collection-title: ICML'11 container-title: >- Proceedings of the 28th International Conference on International Conference on Machine Learning event-place: Madison, WI, USA ISBN: 978-1-4503-0619-5 issued: - year: 2011 month: 6 day: 28 page: 681–688 publisher: Omnipress publisher-place: Madison, WI, USA title: Bayesian learning via stochastic gradient Langevin dynamics type: paper-conference URL: https://icml.cc/2011/papers/398_icmlpaper.pdf - id: WilkinsonBayesNewton2021 accessed: - year: 2022 month: 7 day: 28 author: - family: Wilkinson given: William J. - family: Särkkä given: Simo - family: Solin given: Arno citation-key: WilkinsonBayesNewton2021 DOI: 10.48550/arXiv.2111.01721 issued: - year: 2021 month: 11 day: 3 number: arXiv:2111.01721 publisher: arXiv title: Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees type: article URL: http://arxiv.org/abs/2111.01721 - id: ZellnerOptimal1988 accessed: - year: 2020 month: 9 day: 7 author: - family: Zellner given: Arnold citation-key: ZellnerOptimal1988 container-title: The American Statistician DOI: 10.1080/00031305.1988.10475585 ISSN: 0003-1305 issue: '4' issued: - year: 1988 month: 11 day: 1 page: 278-280 publisher: Taylor & Francis title: Optimal Information Processing and Bayes's Theorem type: article-journal URL: https://ageconsearch.umn.edu/record/296078/files/usc043.pdf volume: '42' - id: ZhangNoisy2018 accessed: - year: 2023 month: 8 day: 28 author: - family: Zhang given: Guodong - family: Sun given: Shengyang - family: Duvenaud given: David - family: Grosse given: Roger citation-key: ZhangNoisy2018 container-title: Proceedings of the 35th International Conference on Machine Learning event-title: International Conference on Machine Learning ISSN: 2640-3498 issued: - year: 2018 month: 7 day: 3 language: en page: 5852-5861 publisher: PMLR title: Noisy Natural Gradient as Variational Inference type: paper-conference URL: http://arxiv.org/abs/1712.02390 ...