Feedback system identification, not necessarily linear



If I have a system whose future evolution is important to predict, why not try to infer a plausible model instead of a convenient linear one?

To reconstruct the state, as opposed to the parameters of the process acting upon the state, we dostate filtering. There can be interplay between these steps, if we are doing simulation-based online parameter inference, as in recursive estimation. Or: we might decide the state is unimportant and attempt to estimate the evolution only of the observations. That is the Koopman operator trick.

I am in the process of taxonomising here. Stuff which fits the particular (classical parametric likelihood) model of recursive estimation and so on will be kept there. Miscellaneous other approaches here. A compact overview is inserted incidentally in Cosma’s review of Fan and Yao (2003) wherein he also recommends (Bosq and Blanke 2007; Bosq 1998; Taniguchi and Kakizawa 2000).

Anyway, for what kind of systems can we infer parameters? Many! each one a new paper. Here is a fun one: Mutually exciting point processes? Yep, (Eden et al. 2004) do that.

There are many methods. From an engineering/control perspective, we have (Brunton, Proctor, and Kutz 2016), generalises the process for linear time series. who give a sparse regression versionIndirect inference, or recursive hierarchical generalised linear models, which is an obvious way to generalise linear systems in the same way GLM generalizes linear models. There are many highly general formulations; (Kitagawa and Gersch 1996) gives a Bayesian “smooth” one. Jonschkowski, Rastogi, and Brock (2018) which learns the dynamics and the observation system simultaneously using neural nets. Corenflos et al. (2021) does that even without the ability to evaluate likelihoods but doing something nifty with optimal transport.

Hefny, Downey, and Gordon (2015):

We address […] these problems with a new view of predictive state methods for dynamical system learning. In this view, a dynamical system learning problem is reduced to a sequence of supervised learning problems. So, we can directly apply the rich literature on supervised learning methods to incorporate many types of prior knowledge about problem structure. We give a general convergence rate analysis that allows a high degree of flexibility in designing estimators. And finally, implementing a new estimator becomes as simple as rearranging our data and calling the appropriate supervised learning subroutines.

[…] More specifically, our contribution is to show that we can use much-more- general supervised learning algorithms in place of linear regression, and still get a meaningful theoretical analysis. In more detail:

  • we point out that we can equally well use any well-behaved supervised learning algorithm in place of linear regression in the first stage of instrumental-variable regression;

  • for the second stage of instrumental-variable regression, we generalize ordinary linear regression to its RKHS counterpart;

  • we analyze the resulting combination, and show that we get convergence to the correct answer, with a rate that depends on how quickly the individual supervised learners converge

Continuous time

IMO an essential research area. Also, sparsely or unevenly observed series are tricky. I’m looking at those at the moment.

Awaiting filing

  • (Pereyra et al. 2016)

    This paper presents a tutorial on stochastic simulation and optimization methods in signal and image processing and points to some interesting research problems. The paper addresses a variety of high-dimensional Markov chain Monte Carlo It also discusses a range of optimization methods that have been adopted to solve stochastic problems, as well as stochastic methods for deterministic optimization.

References

Agarwal, Anish, Muhammad Jehangir Amjad, Devavrat Shah, and Dennis Shen. 2018. “Time Series Analysis via Matrix Estimation.” arXiv:1802.09064 [cs, Stat], February. http://arxiv.org/abs/1802.09064.
Andrews, Donald W. K. 1994. “Empirical Process Methods in Econometrics.” In Handbook of Econometrics, edited by Robert F. Engle and Daniel L. McFadden, 4:2247–94. Elsevier. http://dido.econ.yale.edu/P/cp/p08b/p0887.pdf.
Antoniano-Villalobos, Isadora, and Stephen G. Walker. 2016. “A Nonparametric Model for Stationary Time Series.” Journal of Time Series Analysis 37 (1): 126–42. https://doi.org/10.1111/jtsa.12146.
Arulampalam, M. S., S. Maskell, N. Gordon, and T. Clapp. 2002. “A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking.” IEEE Transactions on Signal Processing 50 (2): 174–88. https://doi.org/10.1109/78.978374.
Ben Taieb, Souhaib, and Amir F. Atiya. 2016. “A Bias and Variance Analysis for Multistep-Ahead Time Series Forecasting.” IEEE transactions on neural networks and learning systems 27 (1): 62–76. https://doi.org/10.1109/TNNLS.2015.2411629.
Bengio, Samy, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. “Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks.” In Advances in Neural Information Processing Systems 28, 1171–79. NIPS’15. Cambridge, MA, USA: Curran Associates, Inc. http://papers.nips.cc/paper/5956-scheduled-sampling-for-sequence-prediction-with-recurrent-neural-networks.
Bosq, Denis. 1998. Nonparametric Statistics for Stochastic Processes: Estimation and Prediction. 2nd ed. Lecture Notes in Statistics 110. New York: Springer.
Bosq, Denis, and Delphine Blanke. 2007. Inference and prediction in large dimensions. Wiley series in probability and statistics. Chichester, England ; Hoboken, NJ: John Wiley/Dunod.
Bretó, Carles, Daihai He, Edward L. Ionides, and Aaron A. King. 2009. “Time series analysis via mechanistic models.” The Annals of Applied Statistics 3 (1): 319–48. https://doi.org/10.1214/08-AOAS201.
Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz. 2016. “Discovering Governing Equations from Data by Sparse Identification of Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences 113 (15): 3932–37. https://doi.org/10.1073/pnas.1517384113.
Bühlmann, Peter, and Hans R Künsch. 1999. “Block Length Selection in the Bootstrap for Time Series.” Computational Statistics & Data Analysis 31 (3): 295–310. https://doi.org/10.1016/S0167-9473(99)00014-6.
Carmi, Avishy Y. 2014. “Compressive System Identification.” In Compressed Sensing & Sparse Filtering, edited by Avishy Y. Carmi, Lyudmila Mihaylova, and Simon J. Godsill, 281–324. Signals and Communication Technology. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-38398-4_9.
Cassidy, Ben, Caroline Rae, and Victor Solo. 2015. “Brain Activity: Connectivity, Sparsity, and Mutual Information.” IEEE Transactions on Medical Imaging 34 (4): 846–60. https://doi.org/10.1109/TMI.2014.2358681.
Chan, Ngai Hang, Ye Lu, and Chun Yip Yau. 2016. “Factor Modelling for High-Dimensional Time Series: Inference and Model Selection.” Journal of Time Series Analysis, January, n/a–. https://doi.org/10.1111/jtsa.12207.
Chevillon, Guillaume. 2007. “Direct Multi-Step Estimation and Forecasting.” Journal of Economic Surveys 21 (4): 746–85. https://doi.org/10.1111/j.1467-6419.2007.00518.x.
Clark, James S., and Ottar N. Bjørnstad. 2004. “Population Time Series: Process Variability, Observation Errors, Missing Values, Lags, and Hidden States.” Ecology 85 (11): 3140–50. https://doi.org/10.1890/03-0520.
Cook, Alex R., Wilfred Otten, Glenn Marion, Gavin J. Gibson, and Christopher A. Gilligan. 2007. “Estimation of Multiple Transmission Rates for Epidemics in Heterogeneous Populations.” Proceedings of the National Academy of Sciences 104 (51): 20392–97. https://doi.org/10.1073/pnas.0706461104.
Corenflos, Adrien, James Thornton, George Deligiannidis, and Arnaud Doucet. 2021. “Differentiable Particle Filtering via Entropy-Regularized Optimal Transport.” arXiv:2102.07850 [cs, Stat], June. http://arxiv.org/abs/2102.07850.
Doucet, Arnaud, Pierre E. Jacob, and Sylvain Rubenthaler. 2013. “Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models.” arXiv:1304.5768 [stat], April. http://arxiv.org/abs/1304.5768.
Durbin, J., and S. J. Koopman. 1997. “Monte Carlo Maximum Likelihood Estimation for Non-Gaussian State Space Models.” Biometrika 84 (3): 669–84. https://doi.org/10.1093/biomet/84.3.669.
———. 2012. Time Series Analysis by State Space Methods. 2nd ed. Oxford Statistical Science Series 38. Oxford: Oxford University Press.
Eden, U, L Frank, R Barbieri, V Solo, and E Brown. 2004. “Dynamic Analysis of Neural Encoding by Point Process Adaptive Filtering.” Neural Computation 16 (5): 971–98. https://doi.org/10.1162/089976604773135069.
Fan, Jianqing, and Qiwei Yao. 2003. Nonlinear Time Series: Nonparametric and Parametric Methods. Springer Series in Statistics. New York: Springer.
Fearnhead, Paul, and Hans R. Künsch. 2018. “Particle Filters and Data Assimilation.” Annual Review of Statistics and Its Application 5 (1): 421–49. https://doi.org/10.1146/annurev-statistics-031017-100232.
Finke, Axel, and Sumeetpal S. Singh. 2016. “Approximate Smoothing and Parameter Estimation in High-Dimensional State-Space Models.” arXiv:1606.08650 [stat], June. http://arxiv.org/abs/1606.08650.
Flunkert, Valentin, David Salinas, and Jan Gasthaus. 2017. “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” arXiv:1704.04110 [cs, Stat], April. http://arxiv.org/abs/1704.04110.
Fraser, Andrew M. 2008. Hidden Markov Models and Dynamical Systems. Philadelphia, PA: Society for Industrial and Applied Mathematics.
Gorad, Ajinkya, Zheng Zhao, and Simo särkkä. 2020. “Parameter Estimation in Non-Linear State-Space Models by Automatic Differentiation of Non-Linear Kalman Filters.” In, 6.
Harvey, A., and S. J. Koopman. 2005. “Structural Time Series Models.” In Encyclopedia of Biostatistics. John Wiley & Sons, Ltd. http://onlinelibrary.wiley.com/doi/10.1002/0470011815.b2a12069/abstract.
Hazan, Elad, Karan Singh, and Cyril Zhang. 2017. “Learning Linear Dynamical Systems via Spectral Filtering.” In NIPS. http://arxiv.org/abs/1711.00946.
He, Daihai, Edward L. Ionides, and Aaron A. King. 2010. “Plug-and-Play Inference for Disease Dynamics: Measles in Large and Small Populations as a Case Study.” Journal of The Royal Society Interface 7 (43): 271–83. https://doi.org/10.1098/rsif.2009.0151.
Hefny, Ahmed, Carlton Downey, and Geoffrey Gordon. 2015. “A New View of Predictive State Methods for Dynamical System Learning.” arXiv:1505.05310 [cs, Stat], May. http://arxiv.org/abs/1505.05310.
Hong, X., R. J. Mitchell, S. Chen, C. J. Harris, K. Li, and G. W. Irwin. 2008. “Model Selection Approaches for Non-Linear System Identification: A Review.” International Journal of Systems Science 39 (10): 925–46. https://doi.org/10.1080/00207720802083018.
Hong, Yongmiao, and Haitao Li. 2005. “Nonparametric Specification Testing for Continuous-Time Models with Applications to Term Structure of Interest Rates.” Review of Financial Studies 18 (1): 37–84. https://doi.org/10.1093/rfs/hhh006.
Ionides, E. L., C. Bretó, and A. A. King. 2006. “Inference for Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences 103 (49): 18438–43. https://doi.org/10.1073/pnas.0603181103.
Ionides, Edward L., Anindya Bhadra, Yves Atchadé, and Aaron King. 2011. “Iterated Filtering.” The Annals of Statistics 39 (3): 1776–1802. https://doi.org/10.1214/11-AOS886.
Jonschkowski, Rico, Divyam Rastogi, and Oliver Brock. 2018. “Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors.” arXiv:1805.11122 [cs, Stat], May. http://arxiv.org/abs/1805.11122.
Kantz, Holger, and Thomas Schreiber. 2004. Nonlinear Time Series Analysis. 2nd ed. Cambridge, UK ; New York: Cambridge University Press.
Kass, Robert E., Shun-Ichi Amari, Kensuke Arai, Emery N. Brown, Casey O. Diekman, Markus Diesmann, Brent Doiron, et al. 2018. “Computational Neuroscience: Mathematical and Statistical Perspectives.” Annual Review of Statistics and Its Application 5 (1): 183–214. https://doi.org/10.1146/annurev-statistics-041715-033733.
Kemerait, R., and D. Childers. 1972. “Signal Detection and Extraction by Cepstrum Techniques.” IEEE Transactions on Information Theory 18 (6): 745–59. https://doi.org/10.1109/TIT.1972.1054926.
Kendall, Bruce E., Stephen P. Ellner, Edward McCauley, Simon N. Wood, Cheryl J. Briggs, William W. Murdoch, and Peter Turchin. 2005. “Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses.” Ecological Monographs 75 (2): 259–76. http://www.sysecol2.ethz.ch/Refs/EntClim/K/Ke169.pdf.
Kitagawa, Genshiro. 1987. “Non-Gaussian State—Space Modeling of Nonstationary Time Series.” Journal of the American Statistical Association 82 (400): 1032–41. https://doi.org/10.1080/01621459.1987.10478534.
———. 1996. “Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models.” Journal of Computational and Graphical Statistics 5 (1): 1–25. https://doi.org/10.1080/10618600.1996.10474692.
Kitagawa, Genshiro, and Will Gersch. 1996. Smoothness Priors Analysis of Time Series. Lecture notes in statistics 116. New York, NY: Springer New York : Imprint : Springer. http://dx.doi.org/10.1007/978-1-4612-0761-0.
Lamb, Alex, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, and Yoshua Bengio. 2016. “Professor Forcing: A New Algorithm for Training Recurrent Networks.” In Advances In Neural Information Processing Systems. http://arxiv.org/abs/1610.09038.
Levin, David N. 2017. “The Inner Structure of Time-Dependent Signals.” arXiv:1703.08596 [cs, Math, Stat], March. http://arxiv.org/abs/1703.08596.
Li, Yang, and Jinqiao Duan. 2021a. “A Data-Driven Approach for Discovering Stochastic Dynamical Systems with Non-Gaussian Levy Noise.” Physica D: Nonlinear Phenomena 417 (March): 132830. https://doi.org/10.1016/j.physd.2020.132830.
———. 2021b. “Extracting Governing Laws from Sample Path Data of Non-Gaussian Stochastic Dynamical Systems.” arXiv:2107.10127 [math, Stat], July. http://arxiv.org/abs/2107.10127.
Ljung, Lennart. 2010. “Perspectives on System Identification.” Annual Reviews in Control 34 (1): 1–12. https://doi.org/10.1016/j.arcontrol.2009.12.001.
Lu, Peter Y., Joan Ariño, and Marin Soljačić. 2021. “Discovering Sparse Interpretable Dynamics from Partial Observations.” arXiv:2107.10879 [physics], July. http://arxiv.org/abs/2107.10879.
Malartic, Quentin, Alban Farchi, and Marc Bocquet. 2021. “State, Global and Local Parameter Estimation Using Local Ensemble Kalman Filters: Applications to Online Machine Learning of Chaotic Dynamics.” arXiv:2107.11253 [nlin, Physics:physics, Stat], July. http://arxiv.org/abs/2107.11253.
Morrill, James, Patrick Kidger, Cristopher Salvi, James Foster, and Terry Lyons. 2020. “Neural CDEs for Long Time Series via the Log-ODE Method.” In, 5.
Nerrand, O., P. Roussel-Ragot, L. Personnaz, G. Dreyfus, and S. Marcos. 1993. “Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms.” Neural Computation 5 (2): 165–99. https://doi.org/10.1162/neco.1993.5.2.165.
Pereyra, M., P. Schniter, É Chouzenoux, J. C. Pesquet, J. Y. Tourneret, A. O. Hero, and S. McLaughlin. 2016. “A Survey of Stochastic Simulation and Optimization Methods in Signal Processing.” IEEE Journal of Selected Topics in Signal Processing 10 (2): 224–41. https://doi.org/10.1109/JSTSP.2015.2496908.
Pham, Tung, and Victor Panaretos. 2016. “Methodology and Convergence Rates for Functional Time Series Regression.” arXiv:1612.07197 [math, Stat], December. http://arxiv.org/abs/1612.07197.
Pillonetto, Gianluigi. 2016. “The Interplay Between System Identification and Machine Learning.” arXiv:1612.09158 [cs, Stat], December. http://arxiv.org/abs/1612.09158.
Plis, Sergey, David Danks, and Jianyu Yang. 2015. “Mesochronal Structure Learning.” Uncertainty in Artificial Intelligence : Proceedings of the … Conference. Conference on Uncertainty in Artificial Intelligence 31 (July). http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4827356/.
Pugachev, V. S., and I. N. Sinit︠s︡yn. 2001. Stochastic systems: theory and applications. River Edge, NJ: World Scientific.
Robinson, P. M. 1983. “Nonparametric Estimators for Time Series.” Journal of Time Series Analysis 4 (3): 185–207. https://doi.org/10.1111/j.1467-9892.1983.tb00368.x.
Routtenberg, Tirza, and Joseph Tabrikian. 2010. “Blind MIMO-AR System Identification and Source Separation with Finite-Alphabet.” IEEE Transactions on Signal Processing 58 (3): 990–1000. https://doi.org/10.1109/TSP.2009.2036043.
Runge, Jakob, Reik V. Donner, and Jürgen Kurths. 2015. “Optimal Model-Free Prediction from Multivariate Time Series.” Physical Review E 91 (5). https://doi.org/10.1103/PhysRevE.91.052909.
Särkkä, Simo. 2007. “On Unscented Kalman Filtering for State Estimation of Continuous-Time Nonlinear Systems.” IEEE Transactions on Automatic Control 52 (9): 1631–41. https://doi.org/10.1109/TAC.2007.904453.
Sjöberg, Jonas, Qinghua Zhang, Lennart Ljung, Albert Benveniste, Bernard Delyon, Pierre-Yves Glorennec, Håkan Hjalmarsson, and Anatoli Juditsky. 1995. “Nonlinear Black-Box Modeling in System Identification: A Unified Overview.” Automatica, Trends in System Identification, 31 (12): 1691–1724. https://doi.org/10.1016/0005-1098(95)00120-8.
Städler, Nicolas, and Sach Mukherjee. 2013. “Penalized estimation in high-dimensional hidden Markov models with state-specific graphical models.” The Annals of Applied Statistics 7 (4): 2157–79. https://doi.org/10.1214/13-AOAS662.
Tallec, Corentin, and Yann Ollivier. 2017. “Unbiasing Truncated Backpropagation Through Time.” arXiv:1705.08209 [cs], May. http://arxiv.org/abs/1705.08209.
Taniguchi, Masanobu, and Yoshihide Kakizawa. 2000. Asymptotic Theory of Statistical Inference for Time Series. Springer Series in Statistics. New York: Springer.
Tanizaki, Hisashi. 2001. “Estimation of Unknown Parameters in Nonlinear and Non-Gaussian State-Space Models.” Journal of Statistical Planning and Inference 96 (2): 301–23. https://doi.org/10.1016/S0378-3758(00)00218-4.
Unser, Michael A., and Pouya Tafti. 2014. An Introduction to Sparse Stochastic Processes. New York: Cambridge University Press. http://www.sparseprocesses.org/sparseprocesses-123456.pdf.
Wedig, W. 1984. “A Critical Review of Methods in Stochastic Structural Dynamics.” Nuclear Engineering and Design 79 (3): 281–87. https://doi.org/10.1016/0029-5493(84)90043-8.
Wen, Ruofeng, Kari Torkkola, and Balakrishnan Narayanaswamy. 2017. “A Multi-Horizon Quantile Recurrent Forecaster.” arXiv:1711.11053 [stat], November. http://arxiv.org/abs/1711.11053.
Werbos, Paul J. 1988. “Generalization of Backpropagation with Application to a Recurrent Gas Market Model.” Neural Networks 1 (4): 339–56. https://doi.org/10.1016/0893-6080(88)90007-X.
Williams, Ronald J., and David Zipser. 1989. “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks.” Neural Computation 1 (2): 270–80. https://doi.org/10.1162/neco.1989.1.2.270.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.