Ensemble Kalman methods for training neural networks

Data assimilation for network weights

September 20, 2022 — September 20, 2022

dynamical systems
likelihood free
linear algebra
machine learning
Monte Carlo
neural nets
signal processing
sparser than thou
stochastic processes
time series

Training neural networks by ensemble Kalman updates instead of SGD. Arises naturally from the dynamical perspective on neural networks. TBD.

Claudia Schilling’s filter (Schillings and Stuart 2017) is an elegant variant of the ensemble Kalman Filter which looks somehow more general than the original but also simpler. Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.

