Scientific machine learning

Machine learning for physical sciences, SciML

May 15, 2017 — January 1, 2025

calculus
dynamical systems
geometry
Hilbert space
how do science
machine learning
neural nets
PDEs
physics
regression
sciml
SDEs
signal processing
statistics
statmech
stochastic processes
surrogate
time series
uncertainty
Figure 1: Consider a spherical flame

In physics, typically, we are concerned with identifying True Parameters for Universal Laws, applicable without prejudice across all the cosmos. We are hunting something like the Platonic ideals that our experiments are poor shadows of. Especially, say, quantum physics or cosmology.

In machine learning, typically we want to make generic predictions for a given process, and quantify how good those predictions can be given how much data we have and the approximate kind of process we witness, and there is no notion of universal truth waiting around the corner to back up our wild fancies. On the other hand, we are less concerned about the noisy sublunary chaos of experiments and don’t need to worry about how far our noise drives us from universal truth as long as we make good predictions in the local problem at hand. But here, far from universality, we have weak and vague notions of how to generalize our models to new circumstances and new noise. That is, in the Platonic ideal of machine learning, there are no Platonic ideals to be found.

(This explanation does no justice to either physics or machine learning, but it will do as framing rather than getting too deep into the history or philosophy of science.)

Can these areas have something to say to one another nevertheless? After an interesting conversation with Shane Keating about the difficulties of ocean dynamics, I am thinking about this in a new way; Generally, we might have notions from physics of what “truly” underlies a system, but where many unknown parameters, noisy measurements, computational intractability and complex or chaotic dynamics interfere with our ability to predict things using only known laws of physics; Here, we want to come up with a “best possible” stochastic model of a system given our uncertainties and constraints, which looks more like an ML problem.

At a basic level, it’s not controversial (I don’t think?) to use machine learning methods to analyse data in experiments, even with trendy deep neural networks. I understand that this is significant, e.g. in connectomics.

Perhaps a little more fringe is using machine learning to reduce computational burden via surrogate models, e.g. Carleo and Troyer (2017).

The thing that is especially interesting to me right now is learning the whole model in ML formalism, using physical laws as input to the learning process.

To be concrete, Shane specifically was discussing problems in predicting and interpolating “tracers”, such as chemical or heat, in oceanographic flows. Here we know lots of things about the fluids concerned, but less about the details of the ocean floor and have imperfect measurements of the details. Nonetheless, we also know that there are certain invariants, conservation laws etc, so a truly “nonparametric” approach to dynamics is certainly throwing away information.

There is some nifty work in learning symbolic approximations to physics, like the SINDy method. However, it’s hard to imagine scaling this up (at least directly) to big things like large image sensor arrays and other such weakly structured input.

Researchers like Chang et al. (2017) claim that learning “compositional object” models should be possible. The compositional models are learnable objects with learnable pairwise interactions, and bear a passing resemblance to something like the physical laws that physics experiments hope to discover, although I’m not yet totally persuaded about the details of this particular framework. On the other hand, unmotivated appealing to autoencoders as descriptions of underlying dynamics of physical reality doesn’t seem sufficient.

There is an O’Reilly podcast and reflist about deep learning for science in particular. There was a special track for papers in this area in NeurIPS.

See various SciML conferences, e.g. ICERM

Figure 2: CNN classification of atmospheric rivers

Sample images of atmospheric rivers correctly classified (true positive) by our deep CNN model. Figure shows total column water vapor (colour map) and land sea boundary (solid line). Y. Liu et al. (2016)

1 Data-informed inference for physical systems

See Physics-based Deep Learning (Thuerey et al. 2021). Also, see Brunton and Kutz’s Data-Driven Science and Engineering web material around their book (Brunton and Kutz 2019). Also, the seminar series by the authors of that latter book is a moving feast of the latest results in this area. For neural dynamics in particular, Patrick Kidger’s thesis seems good (Kidger 2022).

2 ML for PDEs

I do a lot of ML for PArtial Differential Equations. For more on that, see ML PDEs.

3 ML for biology

I think we call this BioML.

4 Causality, identifiability, and observational data

One ML-flavoured notion is the use of observational data to derive the models. Presumably if I am modelling an entire ocean or even river, doing experiments is out of the question for reasons of cost and ethics, and the overall model will be calibrated with observational data. We need to wait until there is a flood to see what floods do. This is generally done badly in ML, but there are formalisms for it, as seen in graphical models for causal inference. Can we work out the confounders and do counterfactual inference? Is imposing an arrow of causation already doing some work for us?

Small subsystems might be informed by experiments, of course.

5 Likelihood free inference

Popular if you have a simulator that can simulate from the system. See likelihood free inference.

6 Emulation approaches

See Emulation and surrogates.

7 The other direction: What does physics say about learning?

See why does deep learning work or the statistical mechanics of statistics.

Related, maybe: the recovery phase transitions in compressed sensing.

8 But statistics is ML

Why not “statistics for physical sciences”? Isn’t ML just statistics? Why thanks, Dan, for asking that. Yes it is, as far as content goes. But the different disciplines licence different uses of the tools. Pragmatically, using predictive modelling tools that ML practitioners advocate has been helpful in doing better statistics for ML. When we talk about statistics in physical processes we tend to think of your grandpappy’s statistics, parametric methods where the parameters are the parameters of physical laws. The modern emphasis in machine learning is in nonparametric, overparameterised or approximate methods that do not necessarily correspond to the world in any interpretable way. Deep learning etc. But sure, that is still statistics if you like. I would have needed to spend more words explaining that though, and buried the lede.

Figure 3

9 Applications

Bushfires, hydrology, climate models, molecular dynamics…

10 Incoming

11 References

Altmann, Henning, and Peterseim. 2021. Numerical Homogenization Beyond Scale Separation.” Acta Numerica.
Altosaar, Ranganath, and Cranmer. 2019. “Hierarchical Variational Models for Statistical Physics.” In.
Asher, Croke, Jakeman, et al. 2015. A Review of Surrogate Models and Their Application to Groundwater Modeling.” Water Resources Research.
Atkinson, Subber, and Wang. 2019. “Data-Driven Discovery of Free-Form Governing Differential Equations.” In.
Auzina, Yildiz, and Gavves. 2022. Latent GP-ODEs with Informative Priors.” In.
Ayed, and de Bézenac. 2019. “Learning Dynamical Systems from Partial Observations.” In Advances In Neural Information Processing Systems.
Baker, Peña, Jayamohan, et al. 2018. Mechanistic Models Versus Machine Learning, a Fight Worth Fighting for the Biological Community? Biology Letters.
Bar-Sinai, Hoyer, Hickey, et al. 2019. Learning Data-Driven Discretizations for Partial Differential Equations.” Proceedings of the National Academy of Sciences.
Beck, E, and Jentzen. 2019. Machine Learning Approximation Algorithms for High-Dimensional Fully Nonlinear Partial Differential Equations and Second-Order Backward Stochastic Differential Equations.” Journal of Nonlinear Science.
Bottero, Calisto, Graziano, et al. 2020. Physics-Informed Machine Learning Simulator for Wildfire Propagation.”
Brehmer, Cranmer, Mishra-Sharma, et al. 2019. “Mining Gold: Improving Simulation-Based Inference with Latent Information.” In.
Brunton, and Kutz. 2019. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control.
Brunton, Proctor, and Kutz. 2016. Discovering Governing Equations from Data by Sparse Identification of Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences.
Carleo, and Troyer. 2017. Solving the Quantum Many-Body Problem with Artificial Neural Networks.” Science.
Chang, Ullman, Torralba, et al. 2017. A Compositional Object-Based Approach to Learning Physical Dynamics.” In Proceedings of ICLR.
Cranmer, Xu, Battaglia, et al. 2019. “Learning Symbolic Physics with Graph Networks.” In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS).
Cui, Peeters, Pagendam, et al. 2018. Emulator-Enabled Approximate Bayesian Computation (ABC) and Uncertainty Analysis for Computationally Expensive Groundwater Models.” Journal of Hydrology.
Deiana, Tran, Agar, et al. 2021. Applications and Techniques for Fast Machine Learning in Science.” arXiv:2110.13041 [Physics].
Faroughi, Pawar, Fernandes, et al. 2023. Physics-Guided, Physics-Informed, and Physics-Encoded Neural Networks in Scientific Computing.”
Filippi, Mallet, and Nader. 2014. Representation and Evaluation of Wildfire Propagation Simulations.” International Journal of Wildland Fire.
Gahungu, Lanyon, Álvarez, et al. 2022. Adjoint-Aided Inference of Gaussian Process Driven Differential Equations.” In.
Ghattas, and Willcox. 2021. Learning Physics-Based Models from Data: Perspectives from Inverse Problems and Model Reduction.” Acta Numerica.
Gilpin. 2023. Model Scale Versus Domain Knowledge in Statistical Forecasting of Chaotic Systems.” Physical Review Research.
Girolami, Febrianto, Yin, et al. 2021. The Statistical Finite Element Method (statFEM) for Coherent Synthesis of Observation Data and Model Predictions.” Computer Methods in Applied Mechanics and Engineering.
Gladish, Pagendam, Peeters, et al. 2018. Emulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models.” Journal of Agricultural, Biological and Environmental Statistics.
Goldstein, and Coco. 2015. Machine Learning Components in Deterministic Models: Hybrid Synergy in the Age of Data.” Frontiers in Environmental Science.
Gulian, Frankel, and Swiler. 2020. Gaussian Process Regression Constrained by Boundary Value Problems.” arXiv:2012.11857 [Cs, Math, Stat].
He, Barajas-Solano, Tartakovsky, et al. 2020. Physics-Informed Neural Networks for Multiphysics Data Assimilation with Application to Subsurface Transport.” Advances in Water Resources.
Hoffimann, Zortea, de Carvalho, et al. 2021. Geostatistical Learning: Challenges and Opportunities.” Frontiers in Applied Mathematics and Statistics.
Holl, Koltun, and Thuerey. 2022. Scale-Invariant Learning by Physics Inversion.” In.
Holl, Thuerey, and Koltun. 2020. Learning to Control PDEs with Differentiable Physics.” In ICLR.
Hu, Anderson, Li, et al. 2020. DiffTaichi: Differentiable Programming for Physical Simulation.” In ICLR.
Hu, Li, Anderson, et al. 2019. Taichi: A Language for High-Performance Computation on Spatially Sparse Data Structures.” ACM Transactions on Graphics.
Innes, Edelman, Fischer, et al. 2019. A Differentiable Programming System to Bridge Machine Learning and Scientific Computing.”
Jakeman. 2023. PyApprox: A Software Package for Sensitivity Analysis, Bayesian Inference, Optimal Experimental Design, and Multi-Fidelity Uncertainty Quantification and Surrogate Modeling.” Environmental Modelling & Software.
Jin, Zhang, and Espinosa. 2023. Recent Advances and Applications of Machine Learning in Experimental Solid Mechanics: A Review.”
John, and Csányi. 2016. Many-Body Coarse-Grained Interactions Using Gaussian Approximation Potentials.”
Jouvet, and Cordonnier. 2023. Ice-Flow Model Emulator Based on Physics-Informed Deep Learning.” Journal of Glaciology.
Karniadakis, Kevrekidis, Lu, et al. 2021. Physics-Informed Machine Learning.” Nature Reviews Physics.
Kashinath, Mustafa, Albert, et al. 2021. Physics-Informed Machine Learning: Case Studies for Weather and Climate Modelling.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.
Kasim, Muhammad, Topp-Mugglestone, Hatfield, et al. 2019. “A Million Times Speed up in Parameters Retrieval with Deep Learning.” In.
Kasim, M. F., Watson-Parris, Deaconu, et al. 2020. Up to Two Billion Times Acceleration of Scientific Simulations with Deep Neural Architecture Search.” arXiv:2001.08055 [Physics, Stat].
Kidger. 2022. On Neural Differential Equations.”
Kimura, Yoshinaga, Sekijima, et al. 2020. Convolutional Neural Network Coupled with a Transfer-Learning Approach for Time-Series Flood Predictions.” Water.
Krämer, Bosch, Schmidt, et al. 2021. Probabilistic ODE Solutions in Millions of Dimensions.”
Kumar, Gleyzer, Kahana, et al. 2023. MyCrunchGPT: A chatGPT Assisted Framework for Scientific Machine Learning.”
Li, Yang, and Duan. 2021a. A Data-Driven Approach for Discovering Stochastic Dynamical Systems with Non-Gaussian Levy Noise.” Physica D: Nonlinear Phenomena.
———. 2021b. Extracting Governing Laws from Sample Path Data of Non-Gaussian Stochastic Dynamical Systems.” arXiv:2107.10127 [Math, Stat].
Li, Yunzhu, Torralba, Anandkumar, et al. 2020. Causal Discovery in Physical Systems from Videos.” arXiv:2007.00631 [Cs, Stat].
Liu, Yunjie, Racah, Prabhat, et al. 2016. Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets.” arXiv:1605.01156 [Cs].
Liu, Xiao, Yeo, and Lu. 2020. Statistical Modeling for Spatio-Temporal Data From Stochastic Convection-Diffusion Processes.” Journal of the American Statistical Association.
Long, Wang, Krishnapriyan, et al. 2022. AutoIP: A United Framework to Integrate Physics into Gaussian Processes.”
Lu, Peter Y., Ariño, and Soljačić. 2021. Discovering Sparse Interpretable Dynamics from Partial Observations.” arXiv:2107.10879 [Physics].
Lu, Lu, Meng, Mao, et al. 2021. DeepXDE: A Deep Learning Library for Solving Differential Equations.” SIAM Review.
Lu, Dan, and Ricciuto. 2019. Efficient Surrogate Modeling Methods for Large-Scale Earth System Models Based on Machine-Learning Techniques.” Geoscientific Model Development.
Malartic, Farchi, and Bocquet. 2021. State, Global and Local Parameter Estimation Using Local Ensemble Kalman Filters: Applications to Online Machine Learning of Chaotic Dynamics.” arXiv:2107.11253 [Nlin, Physics:physics, Stat].
Medasani, Gamst, Ding, et al. 2016. Predicting Defect Behavior in B2 Intermetallics by Merging Ab Initio Modeling and Machine Learning.” Npj Computational Materials.
Meng, Seo, Cao, et al. 2022. When Physics Meets Machine Learning: A Survey of Physics-Informed Machine Learning.” arXiv:2203.16797 [Cs, Stat].
Mo, Lu, Shi, et al. 2017. A Taylor Expansion-Based Adaptive Design Strategy for Global Surrogate Modeling With Applications in Groundwater Modeling.” Water Resources Research.
Nabian, and Meidani. 2019. A Deep Learning Solution Approach for High-Dimensional Random Differential Equations.” Probabilistic Engineering Mechanics.
Nair, Zhu, Savarese, et al. 2019. Causal Induction from Visual Observations for Goal Directed Tasks.” arXiv:1910.01751 [Cs, Stat].
Ng, Zhu, Chen, et al. 2019. A Graph Autoencoder Approach to Causal Structure Learning.” In Advances In Neural Information Processing Systems.
Otness, Gjoka, Bruna, et al. 2021. An Extensible Benchmark Suite for Learning to Simulate Physical Systems.” In.
Paleyes, Pullin, Mahsereci, et al. 2019. Emulation of Physical Processes with Emukit.” In Advances In Neural Information Processing Systems.
Park, Yoo, and Nadiga. 2019. “Machine Learning Climate Variability.” In.
Partee, Ringenburg, Robbins, et al. 2019. “Model Parameter Optimization: ML-Guided Trans-Resolution Tuning of Physical Models.” In.
Pathak, Hunt, Girvan, et al. 2018. Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach.” Physical Review Letters.
Pathak, Lu, Hunt, et al. 2017. Using Machine Learning to Replicate Chaotic Attractors and Calculate Lyapunov Exponents from Data.” Chaos: An Interdisciplinary Journal of Nonlinear Science.
Pestourie, Mroueh, Rackauckas, et al. 2021. Data-Efficient Training with Physics-Enhanced Deep Surrogates.” In.
Popov. 2022. Combining Data-Driven and Theory-Guided Models in Ensemble Data Assimilation.” ETD.
Portwood, Mitra, Ribeiro, et al. 2019. “Turbulence Forecasting via Neural ODE.” In.
Psaros, Meng, Zou, et al. 2023. Uncertainty Quantification in Scientific Machine Learning: Methods, Metrics, and Comparisons.” Journal of Computational Physics.
Qian, Yu-Kun. 2023. Xinvert: A Python Package for Inversion Problems in Geophysical Fluid Dynamics.” Journal of Open Source Software.
Qian, Elizabeth, Kramer, Peherstorfer, et al. 2020. Lift & Learn: Physics-Informed Machine Learning for Large-Scale Nonlinear Dynamical Systems.” Physica D: Nonlinear Phenomena.
Rackauckas, Christopher. 2019. The Essential Tools of Scientific Machine Learning (Scientific ML).”
Rackauckas, Chris, Edelman, Fischer, et al. 2020. Generalized Physics-Informed Learning Through Language-Wide Differentiable Programming.” MIT Web Domain.
Raghu, and Schmidt. 2020. A Survey of Deep Learning for Scientific Discovery.” arXiv:2003.11755 [Cs, Stat].
Raissi, Perdikaris, and Karniadakis. 2019. Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations.” Journal of Computational Physics.
Raissi, Yazdani, and Karniadakis. 2020. Hidden Fluid Mechanics: Learning Velocity and Pressure Fields from Flow Visualizations.” Science.
Ramsundar, Krishnamurthy, and Viswanathan. 2021. Differentiable Physics: A Position Piece.” arXiv:2109.07573 [Physics].
Razavi. 2021. Deep Learning, Explained: Fundamentals, Explainability, and Bridgeability to Process-Based Modelling.” Environmental Modelling & Software.
Razavi, Tolson, and Burn. 2012. Review of Surrogate Modeling in Water Resources.” Water Resources Research.
Rezende, Racanière, Higgins, et al. 2019. “Equivariant Hamiltonian Flows.” In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS).
Saemundsson, Terenin, Hofmann, et al. 2020. Variational Integrator Networks for Physically Structured Embeddings.” arXiv:1910.09349 [Cs, Stat].
Sanchez-Gonzalez, Bapst, Battaglia, et al. 2019. “Hamiltonian Graph Networks with ODE Integrators.” In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS).
Sanchez-Gonzalez, Godwin, Pfaff, et al. 2020. Learning to Simulate Complex Physics with Graph Networks.” In Proceedings of the 37th International Conference on Machine Learning.
Sargsyan, Debusschere, Najm, et al. 2009. Bayesian Inference of Spectral Expansions for Predictability Assessment in Stochastic Reaction Networks.” Journal of Computational and Theoretical Nanoscience.
Sarkar, and Joly. 2019. Multi-FIdelity Learning with Heterogeneous Domains.” In NeurIPS.
Särkkä. 2011. Linear Operators and Stochastic Partial Differential Equations in Gaussian Process Regression.” In Artificial Neural Networks and Machine Learning – ICANN 2011. Lecture Notes in Computer Science.
Siade, Cui, Karelse, et al. 2020. Reduced‐Dimensional Gaussian Process Machine Learning for Groundwater Allocation Planning Using Swarm Theory.” Water Resources Research.
Sun, Yoon, Shih, et al. 2021. Applications of Physics-Informed Scientific Machine Learning in Subsurface Science: A Survey.” arXiv:2104.04764 [Physics].
Tait, and Damoulas. 2020. Variational Autoencoding of PDE Inverse Problems.” arXiv:2006.15641 [Cs, Stat].
Tartakovsky, Marrero, Perdikaris, et al. 2018. Learning Parameters and Constitutive Relationships with Physics Informed Deep Neural Networks.”
Thuerey, Holl, Mueller, et al. 2021. Physics-Based Deep Learning.
Tompson, Schlachter, Sprechmann, et al. 2017. Accelerating Eulerian Fluid Simulation with Convolutional Networks.” In Proceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17.
van der Merwe, Leen, Lu, et al. 2007. Fast Neural Network Surrogates for Very High Dimensional Physics-Based Models in Computational Oceanography.” Neural Networks, Computational Intelligence in Earth and Environmental Sciences,.
Wang, Sankaran, and Perdikaris. 2022. Respecting Causality Is All You Need for Training Physics-Informed Neural Networks.”
Willard, Jia, Xu, et al. n.d. “Integrating Scientific Knowledge with Machine Learning for Engineering and Environmental Systems.”
Witteveen, and Bijl. 2006. Modeling Arbitrary Uncertainties Using Gram-Schmidt Polynomial Chaos.” In 44th AIAA Aerospace Sciences Meeting and Exhibit.
Wu, Maruyama, and Leskovec. 2022. Learning to Accelerate Partial Differential Equations via Latent Global Evolution.”
Yang, Zhang, and Karniadakis. 2020. Physics-Informed Generative Adversarial Networks for Stochastic Differential Equations.” SIAM Journal on Scientific Computing.
Yu, Cui, Sreekanth, et al. 2020. Deep Learning Emulators for Groundwater Contaminant Transport Modelling.” Journal of Hydrology.
Zammit-Mangion, and Wikle. 2020. Deep Integro-Difference Equation Models for Spatio-Temporal Forecasting.” Spatial Statistics.
Zang, Bao, Ye, et al. 2020. Weak Adversarial Networks for High-Dimensional Partial Differential Equations.” Journal of Computational Physics.
Zhang, Guo, and Karniadakis. 2020. Learning in Modal Space: Solving Time-Dependent Stochastic PDEs Using Physics-Informed Neural Networks.” SIAM Journal on Scientific Computing.
Zhang, Lu, Guo, et al. 2019. Quantifying Total Uncertainty in Physics-Informed Neural Networks for Solving Forward and Inverse Stochastic Problems.” Journal of Computational Physics.
Zhu, Zabaras, Koutsourelakis, et al. 2019. Physics-Constrained Deep Learning for High-Dimensional Surrogate Modeling and Uncertainty Quantification Without Labeled Data.” Journal of Computational Physics.