Ensemble Kalman methods for training neural networks
Data assimilation for network weights
September 20, 2022 — September 20, 2022
\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\one}{\unicode{x1D7D9}}\]
Training neural networks by ensemble Kalman updates instead of SGD. Arises naturally from the dynamical perspective on neural networks. TBD.
Claudia Schilling’s filter (Schillings and Stuart 2017) is an elegant variant of the ensemble Kalman Filter which looks somehow more general than the original but also simpler. Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.