Symbolic regression

March 14, 2023 — December 23, 2023

compsci
dynamical systems
machine learning
neural nets
optimization
physics
probabilistic algorithms
sciml
statistics
statmech
stochastic processes
stringology
Figure 1

Fajardo-Fontiveros et al. (2023):

[C]onsider a dataset \(D=\left\{\left(y_i, \mathbf{x}_i\right)\right\}\), with \(i=1, \ldots, N\), generated using the closed form model \(m^*\), so that \(y_i=m^*\left(\mathbf{x}_i, \theta^*\right)+\epsilon_i\) with \(\theta^*\) being the parameters of the model, and \(\epsilon_i\) a random unbiased observation noise drawn from the normal distribution with variance \(s_\epsilon^2\). […] he question we are interested in is: Assuming that \(m^*\) can be expressed in closed form, when is it possible to identify it as the true generating model among all possible closed-form mathematical models, for someone who does not know the true model beforehand? Note that our focus is on learning the structure of the model \(m^*\) and not the values of the parameters \(\theta^*\), a problem that has received much more attention from the theoretical point of view. Additionally, we are interested in situations in which the dimension of the feature space \(\mathbf{x} \in \mathbb{R}^k\) is relatively small (compared to typical feature spaces in machine learning settings), which is the relevant regime for symbolic regression and model discovery.

That paper is particularly interesting for the connection to the statistical mechanics of statistics.

Often this seems to boil down to sparse regression plus some interpretable (i.e. mathematical) feature engineering.

1 SINDy et al

Symbolic regression + system identification = symbolic system identification.

2 References

Atkinson, Subber, and Wang. 2019. “Data-Driven Discovery of Free-Form Governing Differential Equations.” In.
Baker, Peña, Jayamohan, et al. 2018. Mechanistic Models Versus Machine Learning, a Fight Worth Fighting for the Biological Community? Biology Letters.
Brunton, and Kutz. 2019. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control.
Brunton, Proctor, and Kutz. 2016. Discovering Governing Equations from Data by Sparse Identification of Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences.
Champneys, and Rogers. 2024. BINDy – Bayesian Identification of Nonlinear Dynamics with Reversible-Jump Markov-Chain Monte-Carlo.”
Chen, Huang, Raghupathi, et al. 2022. Automated Discovery of Fundamental Variables Hidden in Experimental Data.” Nature Computational Science.
Cranmer, Xu, Battaglia, et al. 2019. “Learning Symbolic Physics with Graph Networks.” In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS).
Evans, and Rzhetsky. 2010. Machine Science.” Science.
Fajardo-Fontiveros, Reichardt, De Los Ríos, et al. 2023. Fundamental Limits to Learning Closed-Form Mathematical Models from Data.” Nature Communications.
Guimerà, Reichardt, Aguilar-Mogas, et al. 2020. A Bayesian Machine Scientist to Aid in the Solution of Challenging Scientific Problems.” Science Advances.
Hirsh, Barajas-Solano, and Kutz. 2022. Sparsifying Priors for Bayesian Uncertainty Quantification in Model Discovery.” Royal Society Open Science.
Jin, Fu, Kang, et al. 2020. Bayesian Symbolic Regression.” arXiv:1910.08892 [Stat].
Li, and Duan. 2021a. A Data-Driven Approach for Discovering Stochastic Dynamical Systems with Non-Gaussian Levy Noise.” Physica D: Nonlinear Phenomena.
———. 2021b. Extracting Governing Laws from Sample Path Data of Non-Gaussian Stochastic Dynamical Systems.” arXiv:2107.10127 [Math, Stat].
Lu, Ariño, and Soljačić. 2021. Discovering Sparse Interpretable Dynamics from Partial Observations.” arXiv:2107.10879 [Physics].
Martius, and Lampert. 2016. Extrapolation and Learning Equations.”
Raghu, and Schmidt. 2020. A Survey of Deep Learning for Scientific Discovery.” arXiv:2003.11755 [Cs, Stat].
Russo, Laiu, and Archibald. 2023. Streaming Compression of Scientific Data via Weak-SINDy.”
Schmidt, and Lipson. 2009. Distilling Free-Form Natural Laws from Experimental Data.” Science.
Udrescu, and Tegmark. 2020. AI Feynman: A Physics-Inspired Method for Symbolic Regression.” Science Advances.