A meeting point for some related ideas from different fields.
Perspectives on analysing systems in terms of a latent, noisy state, and/or their history of noisy observations.
This notebook is dedicated to the possibly-surprising fact we can move between *hidden-state*-type representations, and *observed-state-only* representations, and indeed mix them together conveniently.
I have many thoughts about this, but nothing concrete to write down at the moment.

## State space models

## Linear systems

See linear feedback systems and linear filter design. for stuff about FIR vs IIR filters.

### Linear Time-Invariant systems

Let us talk about Fourier transforms and spectral properties.

## Koopman operators

Learning state is pointless! infer directly from observations! See Koopmania.

## Transformers

## Stability of learning

Hochreiter et al. (2001); Hochreiter (1998); Lamb et al. (2016);Hardt, Ma, and Recht (2018) etc

## Stability of dynamics

## Conversion between representations

## S4

Interesting package of tools from Christopher Ré’s lab, at the intersection of recurrent networks and . See HazyResearch/state-spaces: Sequence Modeling with Structured State Spaces. I find these aesthetically satisfying, because I spent 2 years of my PhD trying to solve the same problem, and failed. These folks did a better job, so I find it slightly validating that the idea was not stupid. Gu et al. (2021):

Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear State-Space Layer (LSSL) maps a sequence u↦y by simply simulating a linear continuous-time state-space representation x˙=Ax+Bu,y=Cx+Du. Theoretically, we show that LSSL models are closely related to the three aforementioned families of models and inherit their strengths. For example, they generalize convolutions to continuous-time, explain common RNN heuristics, and share features of NDEs such as time-scale adaptation. We then incorporate and generalize recent theory on continuous-time memorization to introduce a trainable subset of structured matrices A that endow LSSLs with long-range memory. Empirically, stacking LSSL layers into a simple deep neural network obtains state-of-the-art results across time series benchmarks for long dependencies in sequential image classification, real-world healthcare regression tasks, and speech. On a difficult speech classification task with length-16000 sequences, LSSL outperforms prior approaches by 24 accuracy points, and even outperforms baselines that use hand-crafted features on 100x shorter sequences.

Gu, Goel, and Ré (2021):

A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of 10000 or more steps. A promising recent approach proposed modeling sequences by simulating the fundamental state space model (SSM) \(x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t)\), and showed that for appropriate choices of the state matrix \(A\), this system could handle long-range dependencies mathematically and empirically. However, this method has prohibitive computation and memory requirements, rendering it infeasible as a general sequence modeling solution. We propose the Structured State Space sequence model (S4) based on a new parameterization for the SSM, and show that it can be computed much more efficiently than prior approaches while preserving their theoretical strengths. Our technique involves conditioning \(A\) with a low-rank correction, allowing it to be diagonalized stably and reducing the SSM to the well-studied computation of a Cauchy kernel. S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet, (ii) substantially closing the gap to Transformers on image and language modeling tasks, while performing generation 60× faster (iii) SoTA on every task from the Long Range Arena benchmark, including solving the challenging Path-X task of length 16k that all prior work fails on, while being as efficient as all competitors.

Related? Li et al. (2022)

Interesting parallel to the recursive/non-recursive transformer duality in How the RWKV language models. Question: Can they do the jobs of transformers? Nearly (Vardasbi et al. 2023).

## References

*Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48*, 1120–28. ICML’16. New York, NY, USA: JMLR.org.

*IEEE Signal Processing Magazine*23 (2): 154–61.

*IEEE transactions on neural networks and learning systems*27 (1): 62–76.

*IEEE Transactions on Neural Networks*5 (2): 157–66.

*Journal of Machine Learning Research*10 (December): 1737–54.

*Neural Networks (IJCNN), 2016 International Joint Conference on*, 3399–3406. IEEE.

*Proceedings of ICLR*.

*arXiv:1709.03698 [Cs, Stat]*.

*PRoceedings of ICLR*.

*arXiv:1609.01704 [Cs]*, September.

*Advances in Neural Information Processing Systems 28*, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2980–88. Curran Associates, Inc.

*arXiv:1611.09913 [Cs, Stat]*.

*arXiv Preprint arXiv:1603.09025*.

*Sequential Monte Carlo Methods in Practice*. New York, NY: Springer New York.

*Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2199–2207. Curran Associates, Inc.

*IEEE Transactions on Signal Processing*47 (7): 1890–1902.

*The Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)*, 9:8.

*Advances in Neural Information Processing Systems*, 34:572–85. Curran Associates, Inc.

*Inverse Problems*34 (1): 014004.

*The Journal of Machine Learning Research*19 (1): 1025–68.

*Kalman Filtering and Neural Networks*. Adaptive and Learning Systems for Signal Processing, Communications, and Control. New York: Wiley.

*NIPS*.

*arXiv:2004.09455 [Stat]*, April.

*International Journal of Uncertainty Fuzziness and Knowledge Based Systems*6: 107–15.

*A Field Guide to Dynamical Recurrent Neural Networks*. IEEE Press.

*Neural Computation*9 (8): 1735–80.

*Sequential Monte Carlo Methods in Practice*, 159–75. Statistics for Engineering and Information Science. Springer, New York, NY.

*Proceedings of the National Academy of Sciences*103 (49): 18438–43.

*The Annals of Statistics*39 (3): 1776–1802.

*Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the” Echo State Network” Approach*. Vol. 5. GMD-Forschungszentrum Informationstechnik.

*PMLR*, 1733–41.

*Linear Systems*. Prentice-Hall Information and System Science Series. Englewood Cliffs, N.J: Prentice-Hall.

*Linear Estimation*. Prentice Hall Information and System Sciences Series. Upper Saddle River, N.J: Prentice Hall.

*Advances in Neural Information Processing Systems*. Vol. 33.

*Advances in Neural Information Processing Systems 29*. Curran Associates, Inc.

*arXiv Preprint arXiv:1511.05121*.

*Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence*, 2101–9.

*arXiv Preprint arXiv:1508.06818*.

*BMC Neuroscience*16 (Suppl 1): P196.

*Advances In Neural Information Processing Systems*.

*arXiv:1612.06212 [Cs]*, December.

*arXiv:1602.07320 [Cs]*, February.

*System Identification: Theory for the User*. 2nd ed. Prentice Hall Information and System Sciences Series. Upper Saddle River, NJ: Prentice Hall PTR.

*Theory and Practice of Recursive Identification*. The MIT Press Series in Signal Processing, Optimization, and Control 4. Cambridge, Mass: MIT Press.

*Advances In Neural Information Processing Systems*.

*IEEE Transactions on Signal Processing*58 (3): 1025–34.

*Proceedings of the 28th International Conference on International Conference on Machine Learning*, 1033–40. ICML’11. USA: Omnipress.

*IEEE Signal Processing Magazine*27 (3): 50–61.

*42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475)*, 4:3814–3817 vol.4.

*Proceedings of International Conference on Learning Representations (ICLR) 2017*.

*PMLR*, 2401–9.

*arXiv:1805.10369 [Cs, Stat]*, May.

*Advances in Water Resources*28 (2): 135–47.

*Neural Computation*5 (2): 165–99.

*Perspectives in Robust Control*, 241–57. Lecture Notes in Control and Information Sciences. Springer, London.

*arXiv:1803.05428 [Cs, Eess, Stat]*, March.

*IEEE Transactions on Signal Processing*58 (3): 990–1000.

*Automatica*49 (9): 2860–66.

*Automatica*, Trends in System Identification, 31 (12): 1691–1724.

*Nonlinear Dynamics and Statistics*.

*System Identification*. Upper Saddle River, NJ, USA: Prentice-Hall, Inc.

*arXiv:1805.04955 [Cs, Stat]*, May.

*Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*, 1–9.

*PMLR*, 3387–93.

*Proceedings of International Conference on Learning Representations (ICLR) 2017*.

*IEEE Transactions on Audio and Electroacoustics*15 (2): 70–73.

*Neural Networks*1 (4): 339–56.

*Proceedings of the IEEE*78 (10): 1550–60.

*IEEE Transactions on Information Theory*64 (7): 1–1.

*Neural Computation*2 (4): 490–501.

*Advances in Neural Information Processing Systems 29*.

*IEEE Signal Processing Magazine*28 (1): 145–54.

*Proceedings of the Twentieth International Conference on International Conference on Machine Learning*, 928–35. ICML’03. Washington, DC, USA: AAAI Press.

## No comments yet. Why not leave one?