Feedback system identification, not necessarily linear

Learning dynamics from data

August 1, 2016 — November 15, 2023

dynamical systems
how do science
Lévy processes
machine learning
signal processing
stochastic processes
time series
Figure 1

The order in which this is presented right now makes no sense.

If I have a system whose future evolution is important to predict, why not try to infer a plausible model instead of a convenient linear one?

To reconstruct the unobserved state, as opposed to the parameters of the process acting upon the state, we do state filtering. There can be interplay between these steps, if we are doing simulation-based online parameter inference, as in recursive estimation (what is the division between this and that?) Or: we might decide the state is unimportant and attempt to estimate the evolution only of the observations. That is the Koopman operator trick.

A compact overview is inserted incidentally in Cosma’s review of Fan and Yao (2003) wherein he also recommends (Bosq and Blanke 2007; Bosq 1998; Taniguchi and Kakizawa 2000).

There are many methods. From an engineering/control perspective, we have (Brunton, Proctor, and Kutz 2016), generalises the process for linear time series. to a sparse regression version via Indirect inference, or recursive hierarchical generalised linear models, which is an obvious way to generalise linear systems in the same way GLM generalizes linear models. Kitagawa and Gersch (1996) is popular in a Bayes context.

Hefny, Downey, and Gordon (2015):

We address […] these problems with a new view of predictive state methods for dynamical system learning. In this view, a dynamical system learning problem is reduced to a sequence of supervised learning problems. So, we can directly apply the rich literature on supervised learning methods to incorporate many types of prior knowledge about problem structure. We give a general convergence rate analysis that allows a high degree of flexibility in designing estimators. And finally, implementing a new estimator becomes as simple as rearranging our data and calling the appropriate supervised learning subroutines.

[…] More specifically, our contribution is to show that we can use much-more- general supervised learning algorithms in place of linear regression, and still get a meaningful theoretical analysis. In more detail:

  • we point out that we can equally well use any well-behaved supervised learning algorithm in place of linear regression in the first stage of instrumental-variable regression;

  • for the second stage of instrumental-variable regression, we generalize ordinary linear regression to its RKHS counterpart;

  • we analyze the resulting combination, and show that we get convergence to the correct answer, with a rate that depends on how quickly the individual supervised learners converge

State filters are cool for estimating time-varying hidden states given known fixed system parameters. How about learning those parameters of the model generating your states? Classic ways that you can do this in dynamical systems include basic linear system identification, and general system identification. But can you identify the fixed parameters (not just hidden states) with a state filter?

Yes. This is called recursive estimation.

0.1 Basic Construction

There are a few variations. We start with the basic continuous time state space model.

Here we have an unobserved Markov state process \(x(t)\) on \(\mathcal{X}\) and an observation process \(y(t)\) on \(\mathcal{Y}\). For now they will be assumed to be finite dimensional vectors over \(\mathbb{R}.\) They will additionally depend upon a vector of parameters \(\theta\) We observe the process at discrete times \(t(1:T)=(t_1, t_2,\dots, t_T),\) and we write the observations \(y(1:T)=(y(t_1), y(t_2),\dots, y(1_T)).\)

We presume our processes are completely specified by the following conditional densities (which might not have closed-form expression)

The transition density

\[f(x(t_i)|x(t_{i-1}), \theta)\]

The observation density…


1 Method of adjoints

A trick in differentiation which happens to be useful in differentiating likelihood (or other functions) of time evolving systems using automatic differentiation. e.g. Errico (1997).

See the method of adjoints.

2 In particle filters

See particle filters for system identification.

3 Indirect inference

The simulator is a black box and we have access only to its inputs and outputs. Popular. See simulation-based inference.

4 Learning SDEs

5 Tooling

6 Incoming

  • Corenflos et al. (2021) describe an optimal transport method
  • Campbell et al. (2021) describes variational inference that factors out the unknown parameters.
  • Gu et al. (2021) unifies neural ODEs with RNNs.

7 References

Agarwal, Amjad, Shah, et al. 2018. Time Series Analysis via Matrix Estimation.” arXiv:1802.09064 [Cs, Stat].
Andersson, Gillis, Horn, et al. 2019. CasADi: A Software Framework for Nonlinear Optimization and Optimal Control.” Mathematical Programming Computation.
Andrews. 1994. Empirical Process Methods in Econometrics.” In Handbook of Econometrics.
Antoniano-Villalobos, and Walker. 2016. A Nonparametric Model for Stationary Time Series.” Journal of Time Series Analysis.
Arridge, Maass, Öktem, et al. 2019. Solving Inverse Problems Using Data-Driven Models.” Acta Numerica.
Ben Taieb, and Atiya. 2016. A Bias and Variance Analysis for Multistep-Ahead Time Series Forecasting.” IEEE transactions on neural networks and learning systems.
Bengio, Vinyals, Jaitly, et al. 2015. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks.” In Advances in Neural Information Processing Systems 28. NIPS’15.
Berry, Giannakis, and Harlim. 2020. Bridging Data Science and Dynamical Systems Theory.” arXiv:2002.07928 [Physics, Stat].
Bosq. 1998. Nonparametric Statistics for Stochastic Processes: Estimation and Prediction. Lecture Notes in Statistics 110.
Bosq, and Blanke. 2007. Inference and prediction in large dimensions. Wiley series in probability and statistics.
Bretó, He, Ionides, et al. 2009. Time Series Analysis via Mechanistic Models.” The Annals of Applied Statistics.
Brunton, Proctor, and Kutz. 2016. Discovering Governing Equations from Data by Sparse Identification of Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences.
Bühlmann, and Künsch. 1999. Block Length Selection in the Bootstrap for Time Series.” Computational Statistics & Data Analysis.
Campbell, Shi, Rainforth, et al. 2021. Online Variational Filtering and Parameter Learning.” In.
Carmi. 2014. Compressive System Identification.” In Compressed Sensing & Sparse Filtering. Signals and Communication Technology.
Cassidy, Rae, and Solo. 2015. Brain Activity: Connectivity, Sparsity, and Mutual Information.” IEEE Transactions on Medical Imaging.
Chan, Lu, and Yau. 2016. Factor Modelling for High-Dimensional Time Series: Inference and Model Selection.” Journal of Time Series Analysis.
Chen, Chong, Dou, Chen, et al. 2022. A Novel Neural Network Training Framework with Data Assimilation.” The Journal of Supercomputing.
Chen, Ricky T. Q., and Duvenaud. 2019. Neural Networks with Cheap Differential Operators.” In Advances in Neural Information Processing Systems.
Chen, Tian Qi, Rubanova, Bettencourt, et al. 2018. Neural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems 31.
Chevillon. 2007. Direct Multi-Step Estimation and Forecasting.” Journal of Economic Surveys.
Choromanski, Davis, Likhosherstov, et al. 2020. An Ode to an ODE.” In Advances in Neural Information Processing Systems.
Clark, and Bjørnstad. 2004. Population Time Series: Process Variability, Observation Errors, Missing Values, Lags, and Hidden States.” Ecology.
Cook, Otten, Marion, et al. 2007. Estimation of Multiple Transmission Rates for Epidemics in Heterogeneous Populations.” Proceedings of the National Academy of Sciences.
Corenflos, Thornton, Deligiannidis, et al. 2021. Differentiable Particle Filtering via Entropy-Regularized Optimal Transport.” arXiv:2102.07850 [Cs, Stat].
Course, Evans, and Nair. 2020. Weak Form Generalized Hamiltonian Learning.” In Advances in Neural Information Processing Systems.
de Brouwer, Simm, Arany, et al. 2019. GRU-ODE-Bayes: Continuous Modeling of Sporadically-Observed Time Series.” In Advances in Neural Information Processing Systems.
Doucet, Jacob, and Rubenthaler. 2013. Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models.” arXiv:1304.5768 [Stat].
Durbin, and Koopman. 1997. Monte Carlo Maximum Likelihood Estimation for Non-Gaussian State Space Models.” Biometrika.
———. 2012. Time Series Analysis by State Space Methods. Oxford Statistical Science Series 38.
E, Han, and Li. 2018. A Mean-Field Optimal Control Formulation of Deep Learning.” arXiv:1807.01083 [Cs, Math].
Errico. 1997. What Is an Adjoint Model? Bulletin of the American Meteorological Society.
Evensen. 2003. The Ensemble Kalman Filter: Theoretical Formulation and Practical Implementation.” Ocean Dynamics.
———. 2009a. Data Assimilation - The Ensemble Kalman Filter.
———. 2009b. The Ensemble Kalman Filter for Combined State and Parameter Estimation.” IEEE Control Systems.
Evensen, and van Leeuwen. 2000. An Ensemble Kalman Smoother for Nonlinear Dynamics.” Monthly Weather Review.
Fan, and Yao. 2003. Nonlinear Time Series: Nonparametric and Parametric Methods. Springer Series in Statistics.
Fearnhead, and Künsch. 2018. Particle Filters and Data Assimilation.” Annual Review of Statistics and Its Application.
Finke, and Singh. 2016. Approximate Smoothing and Parameter Estimation in High-Dimensional State-Space Models.” arXiv:1606.08650 [Stat].
Finlay, Jacobsen, Nurbekyan, et al. n.d. “How to Train Your Neural ODE: The World of Jacobian and Kinetic Regularization.” In ICML.
Finzi, Wang, and Wilson. 2020. Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints.” In Advances in Neural Information Processing Systems.
Flunkert, Salinas, and Gasthaus. 2017. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” arXiv:1704.04110 [Cs, Stat].
Fraser. 2008. Hidden Markov Models and Dynamical Systems.
Gholami, Keutzer, and Biros. 2019. ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs.” arXiv:1902.10298 [Cs].
Ghosh, Behl, Dupont, et al. 2020. STEER : Simple Temporal Regularization For Neural ODE.” In Advances in Neural Information Processing Systems.
Gorad, Zhao, and Särkkä. 2020. “Parameter Estimation in Non-Linear State-Space Models by Automatic Differentiation of Non-Linear Kalman Filters.” In.
Grathwohl, Chen, Bettencourt, et al. 2018. FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models.” arXiv:1810.01367 [Cs, Stat].
Gu, Johnson, Goel, et al. 2021. Combining Recurrent, Convolutional, and Continuous-Time Models with Linear State Space Layers.” In Advances in Neural Information Processing Systems.
Haber, Lucka, and Ruthotto. 2018. Never Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.” arXiv:1805.08034 [Cs, Math].
Harvey, and Koopman. 2005. Structural Time Series Models.” In Encyclopedia of Biostatistics.
Hazan, Singh, and Zhang. 2017. Learning Linear Dynamical Systems via Spectral Filtering.” In NIPS.
Hefny, Downey, and Gordon. 2015. A New View of Predictive State Methods for Dynamical System Learning.” arXiv:1505.05310 [Cs, Stat].
He, Ionides, and King. 2010. Plug-and-Play Inference for Disease Dynamics: Measles in Large and Small Populations as a Case Study.” Journal of The Royal Society Interface.
Hirsh, Barajas-Solano, and Kutz. 2022. Sparsifying Priors for Bayesian Uncertainty Quantification in Model Discovery.” Royal Society Open Science.
Holzschuh, Vegetti, and Thuerey. 2022. “Score Matching via Differentiable Physics.”
Hong, Yongmiao, and Li. 2005. Nonparametric Specification Testing for Continuous-Time Models with Applications to Term Structure of Interest Rates.” Review of Financial Studies.
Hong, X., Mitchell, Chen, et al. 2008. Model Selection Approaches for Non-Linear System Identification: A Review.” International Journal of Systems Science.
Houtekamer, and Zhang. 2016. Review of the Ensemble Kalman Filter for Atmospheric Data Assimilation.” Monthly Weather Review.
Ionides, Edward L., Bhadra, Atchadé, et al. 2011. Iterated Filtering.” The Annals of Statistics.
Ionides, E. L., Bretó, and King. 2006. Inference for Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences.
Jia, and Benson. 2019. Neural Jump Stochastic Differential Equations.” In Advances in Neural Information Processing Systems 32.
Jonschkowski, Rastogi, and Brock. 2018. Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors.” arXiv:1805.11122 [Cs, Stat].
Kalli, and Griffin. 2018. Bayesian Nonparametric Vector Autoregressive Models.” Journal of Econometrics.
Kantas, Doucet, Singh, et al. 2015. On Particle Methods for Parameter Estimation in State-Space Models.” Statistical Science.
Kantz, and Schreiber. 2004. Nonlinear Time Series Analysis.
Kass, Amari, Arai, et al. 2018. Computational Neuroscience: Mathematical and Statistical Perspectives.” Annual Review of Statistics and Its Application.
Kelly, Bettencourt, Johnson, et al. 2020. Learning Differential Equations That Are Easy to Solve.” In.
Kemerait, and Childers. 1972. Signal Detection and Extraction by Cepstrum Techniques.” IEEE Transactions on Information Theory.
Kendall, Ellner, McCauley, et al. 2005. Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses.” Ecological Monographs.
Kidger, Chen, and Lyons. 2021. ‘Hey, That’s Not an ODE’: Faster ODE Adjoints via Seminorms.” In Proceedings of the 38th International Conference on Machine Learning.
Kidger, Morrill, Foster, et al. 2020. Neural Controlled Differential Equations for Irregular Time Series.” arXiv:2005.08926 [Cs, Stat].
Kitagawa. 1987. Non-Gaussian State—Space Modeling of Nonstationary Time Series.” Journal of the American Statistical Association.
———. 1996. Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models.” Journal of Computational and Graphical Statistics.
Kitagawa, and Gersch. 1996. Smoothness Priors Analysis of Time Series. Lecture notes in statistics 116.
Kovachki, and Stuart. 2019. Ensemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.” Inverse Problems.
Krishnamurthy, Can, and Schwab. 2022. Theory of Gating in Recurrent Neural Networks.” Physical Review. X.
Lamb, Goyal, Zhang, et al. 2016. Professor Forcing: A New Algorithm for Training Recurrent Networks.” In Advances In Neural Information Processing Systems.
Levin. 2017. The Inner Structure of Time-Dependent Signals.” arXiv:1703.08596 [Cs, Math, Stat].
Li, Yang, and Duan. 2021a. A Data-Driven Approach for Discovering Stochastic Dynamical Systems with Non-Gaussian Levy Noise.” Physica D: Nonlinear Phenomena.
———. 2021b. Extracting Governing Laws from Sample Path Data of Non-Gaussian Stochastic Dynamical Systems.” arXiv:2107.10127 [Math, Stat].
Li, Xuechen, Wong, Chen, et al. 2020. Scalable Gradients for Stochastic Differential Equations.” In International Conference on Artificial Intelligence and Statistics.
Ljung. 2010. Perspectives on System Identification.” Annual Reviews in Control.
Lou, Lim, Katsman, et al. 2020. Neural Manifold Ordinary Differential Equations.” In Advances in Neural Information Processing Systems.
Lu, Ariño, and Soljačić. 2021. Discovering Sparse Interpretable Dynamics from Partial Observations.” arXiv:2107.10879 [Physics].
Luo, Stordal, Lorentzen, et al. 2015. Iterative Ensemble Smoother as an Approximate Solution to a Regularized Minimum-Average-Cost Problem: Theory and Applications.” SPE Journal.
Malartic, Farchi, and Bocquet. 2021. State, Global and Local Parameter Estimation Using Local Ensemble Kalman Filters: Applications to Online Machine Learning of Chaotic Dynamics.” arXiv:2107.11253 [Nlin, Physics:physics, Stat].
Massaroli, Poli, Park, et al. 2020. Dissecting Neural ODEs.” In arXiv:2002.08071 [Cs, Stat].
Mitchell, and Houtekamer. 2000. An Adaptive Ensemble Kalman Filter.” Monthly Weather Review.
Morrill, Kidger, Salvi, et al. 2020. “Neural CDEs for Long Time Series via the Log-ODE Method.” In.
Nerrand, Roussel-Ragot, Personnaz, et al. 1993. Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms.” Neural Computation.
Nguyen, and Malinsky. 2020. “Exploration and Implementation of Neural Ordinary Differential Equations.”
Pereyra, Schniter, Chouzenoux, et al. 2016. A Survey of Stochastic Simulation and Optimization Methods in Signal Processing.” IEEE Journal of Selected Topics in Signal Processing.
Pham, and Panaretos. 2016. Methodology and Convergence Rates for Functional Time Series Regression.” arXiv:1612.07197 [Math, Stat].
Pillonetto. 2016. The Interplay Between System Identification and Machine Learning.” arXiv:1612.09158 [Cs, Stat].
Plis, Danks, and Yang. 2015. Mesochronal Structure Learning.” Uncertainty in Artificial Intelligence : Proceedings of the … Conference. Conference on Uncertainty in Artificial Intelligence.
Poli, Massaroli, Yamashita, et al. 2020. TorchDyn: A Neural Differential Equations Library.” arXiv:2009.09346 [Cs].
Pugachev, and Sinit︠s︡yn. 2001. Stochastic systems: theory and applications.
Rackauckas. 2019. The Essential Tools of Scientific Machine Learning (Scientific ML).”
Rackauckas, Ma, Dixit, et al. 2018. A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions.” arXiv:1812.01892 [Cs].
Rackauckas, Ma, Martensen, et al. 2020. Universal Differential Equations for Scientific Machine Learning.”
Robinson. 1983. Nonparametric Estimators for Time Series.” Journal of Time Series Analysis.
Roeder, Grant, Phillips, et al. 2019. Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems.” arXiv:1905.12090 [Cs, Stat].
Routtenberg, and Tabrikian. 2010. Blind MIMO-AR System Identification and Source Separation with Finite-Alphabet.” IEEE Transactions on Signal Processing.
Runge, Donner, and Kurths. 2015. Optimal Model-Free Prediction from Multivariate Time Series.” Physical Review E.
Ruthotto, and Haber. 2020. Deep Neural Networks Motivated by Partial Differential Equations.” Journal of Mathematical Imaging and Vision.
Sattar, and Oymak. 2022. Non-Asymptotic and Accurate Learning of Nonlinear Dynamical Systems.” Journal of Machine Learning Research.
Schillings, and Stuart. 2017. Analysis of the Ensemble Kalman Filter for Inverse Problems.” SIAM Journal on Numerical Analysis.
Schirmer, Eltayeb, Lessmann, et al. 2022. Modeling Irregular Time Series with Continuous Recurrent Units.”
Schmidt, Krämer, and Hennig. 2021. A Probabilistic State Space Model for Joint Inference from Differential Equations and Data.” arXiv:2103.10153 [Cs, Stat].
Schneider, Stuart, and Wu. 2022. Ensemble Kalman Inversion for Sparse Learning of Dynamical Systems from Time-Averaged Data.” Journal of Computational Physics.
Sjöberg, Zhang, Ljung, et al. 1995. Nonlinear Black-Box Modeling in System Identification: A Unified Overview.” Automatica, Trends in System Identification,.
Städler, and Mukherjee. 2013. Penalized Estimation in High-Dimensional Hidden Markov Models with State-Specific Graphical Models.” The Annals of Applied Statistics.
Stapor, Fröhlich, and Hasenauer. 2018. Optimization and Uncertainty Analysis of ODE Models Using 2nd Order Adjoint Sensitivity Analysis.” bioRxiv.
Stroud, Katzfuss, and Wikle. 2018. A Bayesian Adaptive Ensemble Kalman Filter for Sequential State and Parameter Estimation.” Monthly Weather Review.
Stroud, Stein, Lesht, et al. 2010. An Ensemble Kalman Filter and Smoother for Satellite Data Assimilation.” Journal of the American Statistical Association.
Takamoto, Praditia, Leiteritz, et al. 2022. PDEBench: An Extensive Benchmark for Scientific Machine Learning.” In.
Tallec, and Ollivier. 2017. Unbiasing Truncated Backpropagation Through Time.”
Taniguchi, and Kakizawa. 2000. Asymptotic Theory of Statistical Inference for Time Series. Springer Series in Statistics.
Tanizaki. 2001. Estimation of Unknown Parameters in Nonlinear and Non-Gaussian State-Space Models.” Journal of Statistical Planning and Inference.
Tzen, and Raginsky. 2019. Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit.” arXiv:1905.09883 [Cs, Stat].
Unser, and Tafti. 2014. An Introduction to Sparse Stochastic Processes.
Uziel. 2020. Nonparametric Sequential Prediction While Deep Learning the Kernel.” In International Conference on Artificial Intelligence and Statistics.
Vardasbi, Pires, Schmidt, et al. 2023. State Spaces Aren’t Enough: Machine Translation Needs Attention.”
Wedig. 1984. A Critical Review of Methods in Stochastic Structural Dynamics.” Nuclear Engineering and Design.
Wen, Torkkola, and Narayanaswamy. 2017. A Multi-Horizon Quantile Recurrent Forecaster.” arXiv:1711.11053 [Stat].
Werbos. 1988. Generalization of Backpropagation with Application to a Recurrent Gas Market Model.” Neural Networks.
Williams, and Zipser. 1989. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks.” Neural Computation.
Yang, Stroud, and Huerta. 2018. Sequential Monte Carlo Smoothing with Parameter Estimation.” Bayesian Analysis.
Zammit-Mangion, and Wikle. 2020. Deep Integro-Difference Equation Models for Spatio-Temporal Forecasting.” Spatial Statistics.
Zhang, Han, Gao, Unterman, et al. 2020. Approximation Capabilities of Neural ODEs and Invertible Residual Networks.” arXiv:1907.12998 [Cs, Stat].
Zhang, Jiangjiang, Lin, Li, et al. 2018. An Iterative Local Updating Ensemble Smoother for Estimation and Uncertainty Assessment of Hydrologic Model Parameters With Multimodal Distributions.” Water Resources Research.
Zhao, and Cui. 2023. Tensor-Based Methods for Sequential State and Parameter Estimation in State Space Models.”