Emulators and surrogate models via ML

Shortcuts in scientific simulation using ML

August 12, 2020 — August 26, 2020

feature construction
functional analysis
linear algebra
machine learning
neural nets
sparser than thou
Figure 1

Emulation, a.k.a. surrogate modelling. In this context, it means reducing complicated physics-driven simulations to simpler/or faster ones using ML techniques. Especially popular in the ML for physics pipeline. I have mostly done this in the context of surrogate optimisation for experiments. See Neil Lawrence on Emulation for a modern overview.

A recent, hyped paper that exemplifies this approach is Kasim et al. (2020), which (somewhat implicitly) uses arguments from Dropout ensembling to produce quasi-Bayesian emulations of notoriously slow simulations. Does it actually work? And if it does well quantify posterior predictive uncertainty, can it estimate other posterior uncertainties?

Emukit (Paleyes et al. 2019) is a toolkit which generically wraps ML models for emulation purposes.

Figure 2

ML PDEs might be a particularly useful domain.

1 Model order reduction

The traditional, and still useful, approach is reduced order modelling, which has many related tricks.

2 References

Altmann, Henning, and Peterseim. 2021. Numerical Homogenization Beyond Scale Separation.” Acta Numerica.
Asher, Croke, Jakeman, et al. 2015. A Review of Surrogate Models and Their Application to Groundwater Modeling.” Water Resources Research.
Cui, Peeters, Pagendam, et al. 2018. Emulator-Enabled Approximate Bayesian Computation (ABC) and Uncertainty Analysis for Computationally Expensive Groundwater Models.” Journal of Hydrology.
Forrester, and Keane. 2009. Recent Advances in Surrogate-Based Optimization.” Progress in Aerospace Sciences.
Ghattas, and Willcox. 2021. Learning Physics-Based Models from Data: Perspectives from Inverse Problems and Model Reduction.” Acta Numerica.
Gladish, Pagendam, Peeters, et al. 2018. Emulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models.” Journal of Agricultural, Biological and Environmental Statistics.
Goldstein, and Coco. 2015. Machine Learning Components in Deterministic Models: Hybrid Synergy in the Age of Data.” Frontiers in Environmental Science.
Guth, Schillings, and Weissmann. 2020. Ensemble Kalman Filter for Neural Network Based One-Shot Inversion.”
Higdon, Gattiker, Williams, et al. 2008. Computer Model Calibration Using High-Dimensional Output.” Journal of the American Statistical Association.
Hoffimann, Zortea, de Carvalho, et al. 2021. Geostatistical Learning: Challenges and Opportunities.” Frontiers in Applied Mathematics and Statistics.
Holzschuh, Vegetti, and Thuerey. 2022. “Score Matching via Differentiable Physics.”
Hooten, Leeds, Fiechter, et al. 2011. Assessing First-Order Emulator Inference for Physical Parameters in Nonlinear Mechanistic Models.” Journal of Agricultural, Biological, and Environmental Statistics.
Jarvenpaa, Vehtari, and Marttinen. 2020. Batch Simulations and Uncertainty Quantification in Gaussian Process Surrogate Approximate Bayesian Computation.” In Conference on Uncertainty in Artificial Intelligence.
Kasim, Watson-Parris, Deaconu, et al. 2020. Up to Two Billion Times Acceleration of Scientific Simulations with Deep Neural Architecture Search.” arXiv:2001.08055 [Physics, Stat].
Kononenko, and Kononenko. 2018. Machine Learning and Finite Element Method for Physical Systems Modeling.” arXiv:1801.07337 [Physics].
Laloy, and Jacques. 2019. Emulation of CPU-Demanding Reactive Transport Models: A Comparison of Gaussian Processes, Polynomial Chaos Expansion, and Deep Neural Networks.” Computational Geosciences.
Lam, Sanchez-Gonzalez, Willson, et al. 2023. Learning Skillful Medium-Range Global Weather Forecasting.” Science.
Lu, and Ricciuto. 2019. Efficient Surrogate Modeling Methods for Large-Scale Earth System Models Based on Machine-Learning Techniques.” Geoscientific Model Development.
Mo, Lu, Shi, et al. 2017. A Taylor Expansion-Based Adaptive Design Strategy for Global Surrogate Modeling With Applications in Groundwater Modeling.” Water Resources Research.
O’Hagan, A. 1978. Curve Fitting and Optimal Design for Prediction.” Journal of the Royal Statistical Society: Series B (Methodological).
———. 2006. Bayesian Analysis of Computer Code Outputs: A Tutorial.” Reliability Engineering & System Safety, The Fourth International Conference on Sensitivity Analysis of Model Output (SAMO 2004),.
O’Hagan, Anthony. 2013. “Polynomial Chaos: A Tutorial and Critique from a Statistician’s Perspective.”
Oakley, and Youngman. 2017. Calibration of Stochastic Computer Simulators Using Likelihood Emulation.” Technometrics.
Pachalieva, O’Malley, Harp, et al. 2022. Physics-Informed Machine Learning with Differentiable Programming for Heterogeneous Underground Reservoir Pressure Management.”
Paleyes, Pullin, Mahsereci, et al. 2019. Emulation of Physical Processes with Emukit.” In Advances In Neural Information Processing Systems.
Pestourie, Mroueh, Rackauckas, et al. 2022. Physics-Enhanced Deep Surrogates for PDEs.”
Plumlee. 2017. Bayesian Calibration of Inexact Computer Models.” Journal of the American Statistical Association.
Popov. 2022. Combining Data-Driven and Theory-Guided Models in Ensemble Data Assimilation.” ETD.
Queipo, Haftka, Shyy, et al. 2005. Surrogate-Based Analysis and Optimization.” Progress in Aerospace Sciences.
Razavi, Tolson, and Burn. 2012. Review of Surrogate Modeling in Water Resources.” Water Resources Research.
Sacks, Schiller, and Welch. 1989. Designs for Computer Experiments.” Technometrics.
Sacks, Welch, Mitchell, et al. 1989. Design and Analysis of Computer Experiments.” Statistical Science.
Shankar, Portwood, Mohan, et al. 2020. “Learning Non-Linear Spatio-Temporal Dynamics with Convolutional Neural ODEs.” In Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020).
Siade, Cui, Karelse, et al. 2020. Reduced‐Dimensional Gaussian Process Machine Learning for Groundwater Allocation Planning Using Swarm Theory.” Water Resources Research.
Tait, and Damoulas. 2020. Variational Autoencoding of PDE Inverse Problems.” arXiv:2006.15641 [Cs, Stat].
Teweldebrhan, Schuler, Burkhart, et al. 2020. Coupled machine learning and the limits of acceptability approach applied in parameter identification for a distributed hydrological model.” Hydrology and Earth System Sciences.
Thiagarajan, Venkatesh, Anirudh, et al. 2020. Designing Accurate Emulators for Scientific Processes Using Calibration-Driven Deep Models.” Nature Communications.
Tompson, Schlachter, Sprechmann, et al. 2017. Accelerating Eulerian Fluid Simulation with Convolutional Networks.” In Proceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17.
van der Merwe, Leen, Lu, et al. 2007. Fast Neural Network Surrogates for Very High Dimensional Physics-Based Models in Computational Oceanography.” Neural Networks, Computational Intelligence in Earth and Environmental Sciences,.
Vernon, Goldstein, and Bower. 2014. Galaxy Formation: Bayesian History Matching for the Observable Universe.” Statistical Science.
von Rueden, Mayer, Sifa, et al. 2020. Combining Machine Learning and Simulation to a Hybrid Modelling Approach: Current and Future Directions.” In Advances in Intelligent Data Analysis XVIII.
Watson, and Holmes. 2016. Approximate Models and Robust Decisions.” Statistical Science.
White, Fienen, and Doherty. 2016. A Python Framework for Environmental Model Uncertainty Analysis.” Environmental Modelling & Software.
Yashchuk. 2020. Bringing PDEs to JAX with Forward and Reverse Modes Automatic Differentiation.” In.
Yu, Cui, Sreekanth, et al. 2020. Deep Learning Emulators for Groundwater Contaminant Transport Modelling.” Journal of Hydrology.
Zhu, Zabaras, Koutsourelakis, et al. 2019. Physics-Constrained Deep Learning for High-Dimensional Surrogate Modeling and Uncertainty Quantification Without Labeled Data.” Journal of Computational Physics.