Emulators and surrogate models

Shortcuts in scientific simulation using ML

Emulation, a.k.a. surrogate modelling. In this context, it means reducing complicated physics-driven simulations to simpler/or faster ones using ML techniques. Especially popular in the ML for physics pipeline. I have mostly done this in the context of surrogate optimisation for experiments.

A recent, hyped paper that exemplifies this approach is Kasim et al. (2020), which (somewhat implicitly) uses arguments from Dropout ensembling to produce quasi-Bayesian emulations of notoriously slow simulations. Does it actually work? And if it does well quantify posterior predictive uncertainty, can it estimate other posterior uncertainties?

Emukit (Paleyes et al. 2019) is a toolkit which generically wraps ML models for emulation purposes.

ML PDEs might be a useful thing.


Altmann, Robert, Patrick Henning, and Daniel Peterseim. 2021. β€œNumerical Homogenization Beyond Scale Separation.” Acta Numerica 30 (May): 1–86.
Asher, M. J., B. F. W. Croke, A. J. Jakeman, and L. J. M. Peeters. 2015. β€œA Review of Surrogate Models and Their Application to Groundwater Modeling.” Water Resources Research 51 (8): 5957–73.
Cui, Tao, Luk Peeters, Dan Pagendam, Trevor Pickett, Huidong Jin, Russell S. Crosbie, Matthias Raiber, David W. Rassam, and Mat Gilfedder. 2018. β€œEmulator-Enabled Approximate Bayesian Computation (ABC) and Uncertainty Analysis for Computationally Expensive Groundwater Models.” Journal of Hydrology 564 (September): 191–207.
Forrester, Alexander I. J., and Andy J. Keane. 2009. β€œRecent Advances in Surrogate-Based Optimization.” Progress in Aerospace Sciences 45 (1–3): 50–79.
Ghattas, Omar, and Karen Willcox. 2021. β€œLearning Physics-Based Models from Data: Perspectives from Inverse Problems and Model Reduction.” Acta Numerica 30 (May): 445–554.
Gladish, Daniel W., Daniel E. Pagendam, Luk J. M. Peeters, Petra M. Kuhnert, and Jai Vaze. 2018. β€œEmulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models.” Journal of Agricultural, Biological and Environmental Statistics 23 (1): 39–62.
Goldstein, Evan B., and Giovanni Coco. 2015. β€œMachine Learning Components in Deterministic Models: Hybrid Synergy in the Age of Data.” Frontiers in Environmental Science 3 (April).
Guth, Philipp A., Claudia Schillings, and Simon Weissmann. 2020. β€œEnsemble Kalman Filter for Neural Network Based One-Shot Inversion.” arXiv.
Higdon, Dave, James Gattiker, Brian Williams, and Maria Rightley. 2008. β€œComputer Model Calibration Using High-Dimensional Output.” Journal of the American Statistical Association 103 (482): 570–83.
Hoffimann, JΓΊlio, Maciel Zortea, Breno de Carvalho, and Bianca Zadrozny. 2021. β€œGeostatistical Learning: Challenges and Opportunities.” Frontiers in Applied Mathematics and Statistics 7.
Hooten, Mevin B., William B. Leeds, Jerome Fiechter, and Christopher K. Wikle. 2011. β€œAssessing First-Order Emulator Inference for Physical Parameters in Nonlinear Mechanistic Models.” Journal of Agricultural, Biological, and Environmental Statistics 16 (4): 475–94.
Jarvenpaa, Marko, Aki Vehtari, and Pekka Marttinen. 2020. β€œBatch Simulations and Uncertainty Quantification in Gaussian Process Surrogate Approximate Bayesian Computation.” In Conference on Uncertainty in Artificial Intelligence, 779–88. PMLR.
Kasim, M. F., D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, et al. 2020. β€œUp to Two Billion Times Acceleration of Scientific Simulations with Deep Neural Architecture Search.” arXiv:2001.08055 [Physics, Stat], January.
Kononenko, O., and I. Kononenko. 2018. β€œMachine Learning and Finite Element Method for Physical Systems Modeling.” arXiv:1801.07337 [Physics], March.
Laloy, Eric, and Diederik Jacques. 2019. β€œEmulation of CPU-Demanding Reactive Transport Models: A Comparison of Gaussian Processes, Polynomial Chaos Expansion, and Deep Neural Networks.” Computational Geosciences 23 (5): 1193–1215.
Lu, Dan, and Daniel Ricciuto. 2019. β€œEfficient Surrogate Modeling Methods for Large-Scale Earth System Models Based on Machine-Learning Techniques.” Geoscientific Model Development 12 (5): 1791–1807.
Merwe, Rudolph van der, Todd K. Leen, Zhengdong Lu, Sergey Frolov, and Antonio M. Baptista. 2007. β€œFast Neural Network Surrogates for Very High Dimensional Physics-Based Models in Computational Oceanography.” Neural Networks, Computational Intelligence in Earth and Environmental Sciences, 20 (4): 462–78.
Mo, Shaoxing, Dan Lu, Xiaoqing Shi, Guannan Zhang, Ming Ye, Jianfeng Wu, and Jichun Wu. 2017. β€œA Taylor Expansion-Based Adaptive Design Strategy for Global Surrogate Modeling With Applications in Groundwater Modeling.” Water Resources Research 53 (12): 10802–23.
O’Hagan, A. 1978. β€œCurve Fitting and Optimal Design for Prediction.” Journal of the Royal Statistical Society: Series B (Methodological) 40 (1): 1–24.
β€”β€”β€”. 2006. β€œBayesian Analysis of Computer Code Outputs: A Tutorial.” Reliability Engineering & System Safety, The Fourth International Conference on Sensitivity Analysis of Model Output (SAMO 2004), 91 (10): 1290–300.
O’Hagan, Anthony. 2013. β€œPolynomial Chaos: A Tutorial and Critique from a Statistician’s Perspective,” 20.
Oakley, Jeremy E., and Benjamin D. Youngman. 2017. β€œCalibration of Stochastic Computer Simulators Using Likelihood Emulation.” Technometrics 59 (1): 80–92.
Pachalieva, Aleksandra, Daniel O’Malley, Dylan Robert Harp, and Hari Viswanathan. 2022. β€œPhysics-Informed Machine Learning with Differentiable Programming for Heterogeneous Underground Reservoir Pressure Management.” arXiv.
Paleyes, Andrei, Mark Pullin, Maren Mahsereci, Neil Lawrence, and Javier Gonzalez. 2019. β€œEmulation of Physical Processes with Emukit.” In Advances In Neural Information Processing Systems, 8.
Plumlee, Matthew. 2017. β€œBayesian Calibration of Inexact Computer Models.” Journal of the American Statistical Association 112 (519): 1274–85.
Popov, Andrey Anatoliyevich. 2022. β€œCombining Data-Driven and Theory-Guided Models in Ensemble Data Assimilation.” ETD. Virginia Tech.
Queipo, Nestor V., Raphael T. Haftka, Wei Shyy, Tushar Goel, Rajkumar Vaidyanathan, and P. Kevin Tucker. 2005. β€œSurrogate-Based Analysis and Optimization.” Progress in Aerospace Sciences 41 (1): 1–28.
Razavi, Saman, Bryan A. Tolson, and Donald H. Burn. 2012. β€œReview of Surrogate Modeling in Water Resources.” Water Resources Research 48 (7).
Rueden, Laura von, Sebastian Mayer, Rafet Sifa, Christian Bauckhage, and Jochen Garcke. 2020. β€œCombining Machine Learning and Simulation to a Hybrid Modelling Approach: Current and Future Directions.” In Advances in Intelligent Data Analysis XVIII, edited by Michael R. Berthold, Ad Feelders, and Georg Krempl, 12080:548–60. Cham: Springer International Publishing.
Sacks, Jerome, Susannah B. Schiller, and William J. Welch. 1989. β€œDesigns for Computer Experiments.” Technometrics 31 (1): 41–47.
Sacks, Jerome, William J. Welch, Toby J. Mitchell, and Henry P. Wynn. 1989. β€œDesign and Analysis of Computer Experiments.” Statistical Science 4 (4): 409–23.
Shankar, Varun, Gavin D Portwood, Arvind T Mohan, Peetak P Mitra, Christopher Rackauckas, Lucas A Wilson, David P Schmidt, and Venkatasubramanian Viswanathan. 2020. β€œLearning Non-Linear Spatio-Temporal Dynamics with Convolutional Neural ODEs.” In Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020).
Siade, Adam J., Tao Cui, Robert N. Karelse, and Clive Hampton. 2020. β€œReduced‐Dimensional Gaussian Process Machine Learning for Groundwater Allocation Planning Using Swarm Theory.” Water Resources Research 56 (3).
Tait, Daniel J., and Theodoros Damoulas. 2020. β€œVariational Autoencoding of PDE Inverse Problems.” arXiv:2006.15641 [Cs, Stat], June.
Teweldebrhan, Aynom T., Thomas V. Schuler, John F. Burkhart, and Morten Hjorth-Jensen. 2020. β€œCoupled machine learning and the limits of acceptability approach applied in parameter identification for a distributed hydrological model.” Hydrology and Earth System Sciences 24 (9): 4641–58.
Thiagarajan, Jayaraman J., Bindya Venkatesh, Rushil Anirudh, Peer-Timo Bremer, Jim Gaffney, Gemma Anderson, and Brian Spears. 2020. β€œDesigning Accurate Emulators for Scientific Processes Using Calibration-Driven Deep Models.” Nature Communications 11 (1): 5622.
Tompson, Jonathan, Kristofer Schlachter, Pablo Sprechmann, and Ken Perlin. 2017. β€œAccelerating Eulerian Fluid Simulation with Convolutional Networks.” In Proceedings of the 34th International Conference on Machine Learning - Volume 70, 3424–33. ICML’17. Sydney, NSW, Australia: JMLR.org.
Vernon, Ian, Michael Goldstein, and Richard Bower. 2014. β€œGalaxy Formation: Bayesian History Matching for the Observable Universe.” Statistical Science 29 (1): 81–90.
Watson, James, and Chris Holmes. 2016. β€œApproximate Models and Robust Decisions.” Statistical Science 31 (4): 465–89.
White, Jeremy T., Michael N. Fienen, and John E. Doherty. 2016. β€œA Python Framework for Environmental Model Uncertainty Analysis.” Environmental Modelling & Software 85 (November): 217–28.
Yashchuk, Ivan. 2020. β€œBringing PDEs to JAX with Forward and Reverse Modes Automatic Differentiation.” In.
Yu, Xiayang, Tao Cui, J. Sreekanth, Stephane Mangeon, Rebecca Doble, Pei Xin, David Rassam, and Mat Gilfedder. 2020. β€œDeep Learning Emulators for Groundwater Contaminant Transport Modelling.” Journal of Hydrology, August, 125351.
Zhu, Yinhao, Nicholas Zabaras, Phaedon-Stelios Koutsourelakis, and Paris Perdikaris. 2019. β€œPhysics-Constrained Deep Learning for High-Dimensional Surrogate Modeling and Uncertainty Quantification Without Labeled Data.” Journal of Computational Physics 394 (October): 56–81.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.