Emulators and surrogate models

Shortcuts in scientific simulation using ML

Emulation, a.k.a. surrogate modelling. In this context, it means reducing complicated physics-driven simulations to simpler/or faster ones using ML techniques. Especially popular in the ML for physics pipeline. I have mostly done this in the context of surrogate optimisation for experiments.

A recent, hyped paper that exemplifies this approach is Kasim et al. (2020), which (somewhat implicitly) uses arguments from Dropout ensembling to produce quasi-Bayesian emulations of notoriously slow simulations. Does it actually work? And if it does well quantify posterior predictive uncertainty, can it estimate other posterior uncertainties?

Emukit (Paleyes et al. 2019) is a toolkit which generically wraps ML models for emulation purposes.

ML PDEs might be a useful thing here.


Asher, M. J., B. F. W. Croke, A. J. Jakeman, and L. J. M. Peeters. 2015. “A Review of Surrogate Models and Their Application to Groundwater Modeling.” Water Resources Research 51 (8): 5957–73. https://doi.org/10.1002/2015WR016967.
Cui, Tao, Luk Peeters, Dan Pagendam, Trevor Pickett, Huidong Jin, Russell S. Crosbie, Matthias Raiber, David W. Rassam, and Mat Gilfedder. 2018. “Emulator-Enabled Approximate Bayesian Computation (ABC) and Uncertainty Analysis for Computationally Expensive Groundwater Models.” Journal of Hydrology 564 (September): 191–207. https://doi.org/10.1016/j.jhydrol.2018.07.005.
Gladish, Daniel W., Daniel E. Pagendam, Luk J. M. Peeters, Petra M. Kuhnert, and Jai Vaze. 2018. “Emulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models.” Journal of Agricultural, Biological and Environmental Statistics 23 (1): 39–62. https://doi.org/10.1007/s13253-017-0308-3.
Goldstein, Evan B., and Giovanni Coco. 2015. “Machine Learning Components in Deterministic Models: Hybrid Synergy in the Age of Data.” Frontiers in Environmental Science 3 (April). https://doi.org/10.3389/fenvs.2015.00033.
Higdon, Dave, James Gattiker, Brian Williams, and Maria Rightley. 2008. “Computer Model Calibration Using High-Dimensional Output.” Journal of the American Statistical Association 103 (482): 570–83. http://www.jstor.org/stable/27640080.
Hoffimann, Júlio, Maciel Zortea, Breno de Carvalho, and Bianca Zadrozny. 2021. “Geostatistical Learning: Challenges and Opportunities.” Frontiers in Applied Mathematics and Statistics 7. https://doi.org/10.3389/fams.2021.689393.
Hooten, Mevin B., William B. Leeds, Jerome Fiechter, and Christopher K. Wikle. 2011. “Assessing First-Order Emulator Inference for Physical Parameters in Nonlinear Mechanistic Models.” Journal of Agricultural, Biological, and Environmental Statistics 16 (4): 475–94. http://www.jstor.org/stable/23238828.
Jarvenpaa, Marko, Aki Vehtari, and Pekka Marttinen. 2020. “Batch Simulations and Uncertainty Quantification in Gaussian Process Surrogate Approximate Bayesian Computation.” In Conference on Uncertainty in Artificial Intelligence, 779–88. PMLR. http://proceedings.mlr.press/v124/jarvenpaa20a.html.
Kasim, M. F., D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, et al. 2020. “Up to Two Billion Times Acceleration of Scientific Simulations with Deep Neural Architecture Search.” arXiv:2001.08055 [physics, Stat], January. http://arxiv.org/abs/2001.08055.
Kononenko, O., and I. Kononenko. 2018. “Machine Learning and Finite Element Method for Physical Systems Modeling.” arXiv:1801.07337 [physics], March. http://arxiv.org/abs/1801.07337.
Laloy, Eric, and Diederik Jacques. 2019. “Emulation of CPU-Demanding Reactive Transport Models: A Comparison of Gaussian Processes, Polynomial Chaos Expansion, and Deep Neural Networks.” Computational Geosciences 23 (5): 1193–1215. https://doi.org/10.1007/s10596-019-09875-y.
Lu, Dan, and Daniel Ricciuto. 2019. “Efficient Surrogate Modeling Methods for Large-Scale Earth System Models Based on Machine-Learning Techniques.” Geoscientific Model Development 12 (5): 1791–1807. https://doi.org/10.5194/gmd-12-1791-2019.
Merwe, Rudolph van der, Todd K. Leen, Zhengdong Lu, Sergey Frolov, and Antonio M. Baptista. 2007. “Fast Neural Network Surrogates for Very High Dimensional Physics-Based Models in Computational Oceanography.” Neural Networks, Computational Intelligence in Earth and Environmental Sciences, 20 (4): 462–78. https://doi.org/10.1016/j.neunet.2007.04.023.
Mo, Shaoxing, Dan Lu, Xiaoqing Shi, Guannan Zhang, Ming Ye, Jianfeng Wu, and Jichun Wu. 2017. “A Taylor Expansion-Based Adaptive Design Strategy for Global Surrogate Modeling With Applications in Groundwater Modeling.” Water Resources Research 53 (12): 10802–23. https://doi.org/10.1002/2017WR021622.
O’Hagan, A. 1978. “Curve Fitting and Optimal Design for Prediction.” Journal of the Royal Statistical Society: Series B (Methodological) 40 (1): 1–24. https://doi.org/10.1111/j.2517-6161.1978.tb01643.x.
———. 2006. “Bayesian Analysis of Computer Code Outputs: A Tutorial.” Reliability Engineering & System Safety, The Fourth International Conference on Sensitivity Analysis of Model Output (SAMO 2004), 91 (10): 1290–300. https://doi.org/10.1016/j.ress.2005.11.025.
O’Hagan, Anthony. 2013. “Polynomial Chaos: A Tutorial and Critique from a Statistician’s Perspective,” 20.
Oakley, Jeremy E., and Benjamin D. Youngman. 2017. “Calibration of Stochastic Computer Simulators Using Likelihood Emulation.” Technometrics 59 (1): 80–92. https://doi.org/10.1080/00401706.2015.1125391.
Paleyes, Andrei, Mark Pullin, Maren Mahsereci, Neil Lawrence, and Javier Gonzalez. 2019. “Emulation of Physical Processes with Emukit.” In Advances In Neural Information Processing Systems, 8. https://ml4physicalsciences.github.io/files/NeurIPS_ML4PS_2019_113.pdf.
Plumlee, Matthew. 2017. “Bayesian Calibration of Inexact Computer Models.” Journal of the American Statistical Association 112 (519): 1274–85. https://doi.org/10.1080/01621459.2016.1211016.
Razavi, Saman, Bryan A. Tolson, and Donald H. Burn. 2012. “Review of Surrogate Modeling in Water Resources.” Water Resources Research 48 (7). https://doi.org/10.1029/2011WR011527.
Rueden, Laura von, Sebastian Mayer, Rafet Sifa, Christian Bauckhage, and Jochen Garcke. 2020. “Combining Machine Learning and Simulation to a Hybrid Modelling Approach: Current and Future Directions.” In Advances in Intelligent Data Analysis XVIII, edited by Michael R. Berthold, Ad Feelders, and Georg Krempl, 12080:548–60. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-44584-3_43.
Sacks, Jerome, Susannah B. Schiller, and William J. Welch. 1989. “Designs for Computer Experiments.” Technometrics 31 (1): 41–47. http://www.jstor.org/stable/1270363.
Sacks, Jerome, William J. Welch, Toby J. Mitchell, and Henry P. Wynn. 1989. “Design and Analysis of Computer Experiments.” Statistical Science 4 (4): 409–23. https://doi.org/10.1214/ss/1177012413.
Shankar, Varun, Gavin D Portwood, Arvind T Mohan, Peetak P Mitra, Christopher Rackauckas, Lucas A Wilson, David P Schmidt, and Venkatasubramanian Viswanathan. 2020. “Learning Non-Linear Spatio-Temporal Dynamics with Convolutional Neural ODEs.” In Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020).
Siade, Adam J., Tao Cui, Robert N. Karelse, and Clive Hampton. 2020. “Reduced‐Dimensional Gaussian Process Machine Learning for Groundwater Allocation Planning Using Swarm Theory.” Water Resources Research 56 (3). https://doi.org/10.1029/2019WR026061.
Tait, Daniel J., and Theodoros Damoulas. 2020. “Variational Autoencoding of PDE Inverse Problems.” arXiv:2006.15641 [cs, Stat], June. http://arxiv.org/abs/2006.15641.
Teweldebrhan, Aynom T., Thomas V. Schuler, John F. Burkhart, and Morten Hjorth-Jensen. 2020. “Coupled machine learning and the limits of acceptability approach applied in parameter identification for a distributed hydrological model.” Hydrology and Earth System Sciences 24 (9): 4641–58. https://doi.org/10.5194/hess-24-4641-2020.
Thiagarajan, Jayaraman J., Bindya Venkatesh, Rushil Anirudh, Peer-Timo Bremer, Jim Gaffney, Gemma Anderson, and Brian Spears. 2020. “Designing Accurate Emulators for Scientific Processes Using Calibration-Driven Deep Models.” Nature Communications 11 (1): 5622. https://doi.org/10.1038/s41467-020-19448-8.
Tompson, Jonathan, Kristofer Schlachter, Pablo Sprechmann, and Ken Perlin. 2017. “Accelerating Eulerian Fluid Simulation with Convolutional Networks.” In Proceedings of the 34th International Conference on Machine Learning - Volume 70, 3424–33. ICML’17. Sydney, NSW, Australia: JMLR.org. http://proceedings.mlr.press/v70/tompson17a.html.
Vernon, Ian, Michael Goldstein, and Richard Bower. 2014. “Galaxy Formation: Bayesian History Matching for the Observable Universe.” Statistical Science 29 (1): 81–90. https://doi.org/10.1214/12-STS412.
White, Jeremy T., Michael N. Fienen, and John E. Doherty. 2016. “A Python Framework for Environmental Model Uncertainty Analysis.” Environmental Modelling & Software 85 (November): 217–28. https://doi.org/10.1016/j.envsoft.2016.08.017.
Yashchuk, Ivan. 2020. “Bringing PDEs to JAX with Forward and Reverse Modes Automatic Differentiation.” In. https://openreview.net/forum?id=nEPNoiGsU3.
Yu, Xiayang, Tao Cui, J. Sreekanth, Stephane Mangeon, Rebecca Doble, Pei Xin, David Rassam, and Mat Gilfedder. 2020. “Deep Learning Emulators for Groundwater Contaminant Transport Modelling.” Journal of Hydrology, August, 125351. https://doi.org/10.1016/j.jhydrol.2020.125351.
Zhu, Yinhao, Nicholas Zabaras, Phaedon-Stelios Koutsourelakis, and Paris Perdikaris. 2019. “Physics-Constrained Deep Learning for High-Dimensional Surrogate Modeling and Uncertainty Quantification Without Labeled Data.” Journal of Computational Physics 394 (October): 56–81. https://doi.org/10.1016/j.jcp.2019.05.024.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.