Ensemble Kalman methods for training neural networks

Data assimilation for network weights



\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\one}{\unicode{x1D7D9}}\]

Training neural networks by ensemble Kalman updates instead of SGD. Arises naturally from the dynamical perspective on neural networks. TBD.

Claudia Schilling’s filter (Schillings and Stuart 2017) is an elegant variant of the ensemble Kalman Filter which looks somehow more general than the original but also simpler. Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.

References

Chada, Neil K., Yuming Chen, and Daniel Sanz-Alonso. 2021. β€œIterative Ensemble Kalman Methods: A Unified Perspective with Some New Variants.” Foundations of Data Science 3 (3): 331.
Chada, Neil K., Marco A. Iglesias, Lassi Roininen, and Andrew M. Stuart. 2018. β€œParameterizations for Ensemble Kalman Inversion.” Inverse Problems 34 (5): 055009.
Chen, Chong, Yixuan Dou, Jie Chen, and Yaru Xue. 2022. β€œA Novel Neural Network Training Framework with Data Assimilation.” The Journal of Supercomputing, June.
Dunbar, Oliver R. A., Andrew B. Duncan, Andrew M. Stuart, and Marie-Therese Wolfram. 2022. β€œEnsemble Inference Methods for Models With Noisy and Expensive Likelihoods.” SIAM Journal on Applied Dynamical Systems 21 (2): 1539–72.
Galy-Fajou, ThΓ©o, Valerio Perrone, and Manfred Opper. 2021. β€œFlexible and Efficient Inference with Particles for the Variational Gaussian Approximation.” Entropy 23 (8): 990.
Guth, Philipp A., Claudia Schillings, and Simon Weissmann. 2020. β€œEnsemble Kalman Filter for Neural Network Based One-Shot Inversion.” arXiv.
Haber, Eldad, Felix Lucka, and Lars Ruthotto. 2018. β€œNever Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.” arXiv:1805.08034 [Cs, Math], May.
Huang, Daniel Zhengyu, Tapio Schneider, and Andrew M. Stuart. 2022. β€œIterated Kalman Methodology for Inverse Problems.” Journal of Computational Physics 463 (August): 111262.
Iglesias, Marco A., Kody J. H. Law, and Andrew M. Stuart. 2013. β€œEnsemble Kalman Methods for Inverse Problems.” Inverse Problems 29 (4): 045001.
Kovachki, Nikola B., and Andrew M. Stuart. 2019. β€œEnsemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.” Inverse Problems 35 (9): 095005.
Ritter, Hippolyt, Martin Kukla, Cheng Zhang, and Yingzhen Li. 2021. β€œSparse Uncertainty Representation in Deep Learning with Inducing Weights.” arXiv:2105.14594 [Cs, Stat], May.
Schillings, Claudia, and Andrew M. Stuart. 2017. β€œAnalysis of the Ensemble Kalman Filter for Inverse Problems.” SIAM Journal on Numerical Analysis 55 (3): 1264–90.
Taghvaei, Amirhossein, and Prashant G. Mehta. 2021. β€œAn Optimal Transport Formulation of the Ensemble Kalman Filter.” IEEE Transactions on Automatic Control 66 (7): 3052–67.
Venturi, Daniele, and Xiantao Li. 2022. β€œThe Mori-Zwanzig Formulation of Deep Learning.” arXiv.
Wen, Linjie, and Jinglai Li. 2022. β€œAffine-Mapping Based Variational Ensemble Kalman Filter.” Statistics and Computing 32 (6): 97.
Yegenoglu, Alper, Kai Krajsek, Sandra Diaz Pier, and Michael Herty. 2020. β€œEnsemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-Performing Gradient Descent.” In Machine Learning, Optimization, and Data Science, edited by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Giorgio Jansen, Vincenzo Sciacca, Panos Pardalos, Giovanni Giuffrida, and Renato Umeton, 12566:78–92. Cham: Springer International Publishing.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.