# Ensemble Kalman methods for training neural networks

Data assimilation for network weights

September 20, 2022 — September 20, 2022

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\one}{\unicode{x1D7D9}}\]

Training neural networks by ensemble Kalman updates instead of SGD. Arises naturally from the dynamical perspective on neural networks. TBD.

Claudia Schilling’s filter (Schillings and Stuart 2017) is an elegant variant of the ensemble Kalman Filter which looks somehow more general than the original but also simpler. Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.

## 1 References

*Foundations of Data Science*.

*Inverse Problems*.

*The Journal of Supercomputing*.

*SIAM Journal on Applied Dynamical Systems*.

*Entropy*.

*arXiv:1805.08034 [Cs, Math]*.

*Kalman Filtering and Neural Networks*. Adaptive and Learning Systems for Signal Processing, Communications, and Control.

*Journal of Computational Physics*.

*Inverse Problems*.

*Inverse Problems*.

*arXiv:2105.14594 [Cs, Stat]*.

*SIAM Journal on Numerical Analysis*.

*IEEE Transactions on Automatic Control*.

*Statistics and Computing*.

*Machine Learning, Optimization, and Data Science*.