Ensemble Kalman methods for training neural networks

Data assimilation for network weights



\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\one}{\unicode{x1D7D9}}\]

Training neural networks by ensemble Kalman updates instead of SGD. Arises naturally from the dynamical perspective on neural networks. TBD.

Claudia Schilling’s filter (Schillings and Stuart 2017) is an elegant variant of the ensemble Kalman Filter which looks somehow more general than the original but also simpler. Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.

References

Chen, Chong, Yixuan Dou, Jie Chen, and Yaru Xue. 2022. β€œA Novel Neural Network Training Framework with Data Assimilation.” The Journal of Supercomputing, June.
Haber, Eldad, Felix Lucka, and Lars Ruthotto. 2018. β€œNever Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.” arXiv:1805.08034 [Cs, Math], May.
Kovachki, Nikola B., and Andrew M. Stuart. 2019. β€œEnsemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.” Inverse Problems 35 (9): 095005.
Schillings, Claudia, and Andrew M. Stuart. 2017. β€œAnalysis of the Ensemble Kalman Filter for Inverse Problems.” SIAM Journal on Numerical Analysis 55 (3): 1264–90.
Venturi, Daniele, and Xiantao Li. 2022. β€œThe Mori-Zwanzig Formulation of Deep Learning.” arXiv.
Yegenoglu, Alper, Kai Krajsek, Sandra Diaz Pier, and Michael Herty. 2020. β€œEnsemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-Performing Gradient Descent.” In Machine Learning, Optimization, and Data Science, edited by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Giorgio Jansen, Vincenzo Sciacca, Panos Pardalos, Giovanni Giuffrida, and Renato Umeton, 12566:78–92. Cham: Springer International Publishing.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.