Emulators and surrogate models

Shortcuts in scientific simulation using ML


Emulation, a.k.a. surrogate modelling. I n this context, it means reducing complicated physics-driven simulations to simpler/or faster ones using ML techniques. Especially popular in the ML for physics pipeline. I have mostly done this in the context of surrogate optimisation for experiments.

A recent, hyped paper that exemplifies this approach is Kasim et al. (2020), which (somewhat implicitly) uses arguments from Gaussian process regression to produce quasi-Bayesian emulations of notoriously slow simulations. I am amazed that this works.

Emukit (Paleyes et al. 2019}) is a toolkit which generically wraps ML models for emulation purposes.

Asher, M. J., B. F. W. Croke, A. J. Jakeman, and L. J. M. Peeters. 2015. “A Review of Surrogate Models and Their Application to Groundwater Modeling.” Water Resources Research 51 (8): 5957–73. https://doi.org/10.1002/2015WR016967.

Cui, Tao, Luk Peeters, Dan Pagendam, Trevor Pickett, Huidong Jin, Russell S. Crosbie, Matthias Raiber, David W. Rassam, and Mat Gilfedder. 2018. “Emulator-Enabled Approximate Bayesian Computation (ABC) and Uncertainty Analysis for Computationally Expensive Groundwater Models.” Journal of Hydrology 564 (September): 191–207. https://doi.org/10.1016/j.jhydrol.2018.07.005.

Gladish, Daniel W., Daniel E. Pagendam, Luk J. M. Peeters, Petra M. Kuhnert, and Jai Vaze. 2018. “Emulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models.” Journal of Agricultural, Biological and Environmental Statistics 23 (1): 39–62. https://doi.org/10.1007/s13253-017-0308-3.

Goldstein, Evan B., and Giovanni Coco. 2015. “Machine Learning Components in Deterministic Models: Hybrid Synergy in the Age of Data.” Frontiers in Environmental Science 3 (April). https://doi.org/10.3389/fenvs.2015.00033.

Jarvenpaa, Marko, Aki Vehtari, and Pekka Marttinen. 2020. “Batch Simulations and Uncertainty Quantification in Gaussian Process Surrogate Approximate Bayesian Computation.” In Conference on Uncertainty in Artificial Intelligence, 779–88. PMLR. http://proceedings.mlr.press/v124/jarvenpaa20a.html.

Kasim, M. F., D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, et al. 2020. “Up to Two Billion Times Acceleration of Scientific Simulations with Deep Neural Architecture Search,” January. http://arxiv.org/abs/2001.08055.

Laloy, Eric, and Diederik Jacques. 2019. “Emulation of CPU-Demanding Reactive Transport Models: A Comparison of Gaussian Processes, Polynomial Chaos Expansion, and Deep Neural Networks.” Computational Geosciences 23 (5): 1193–1215. https://doi.org/10.1007/s10596-019-09875-y.

Lu, Dan, and Daniel Ricciuto. 2019. “Efficient Surrogate Modeling Methods for Large-Scale Earth System Models Based on Machine-Learning Techniques.” Geoscientific Model Development 12 (5): 1791–1807. https://doi.org/10.5194/gmd-12-1791-2019.

Merwe, Rudolph van der, Todd K. Leen, Zhengdong Lu, Sergey Frolov, and Antonio M. Baptista. 2007. “Fast Neural Network Surrogates for Very High Dimensional Physics-Based Models in Computational Oceanography.” Neural Networks, Computational Intelligence in Earth and Environmental Sciences, 20 (4): 462–78. https://doi.org/10.1016/j.neunet.2007.04.023.

Mo, Shaoxing, Dan Lu, Xiaoqing Shi, Guannan Zhang, Ming Ye, Jianfeng Wu, and Jichun Wu. 2017. “A Taylor Expansion-Based Adaptive Design Strategy for Global Surrogate Modeling with Applications in Groundwater Modeling.” Water Resources Research 53 (12): 10802–23. https://doi.org/10.1002/2017WR021622.

Oakley, Jeremy E., and Benjamin D. Youngman. 2017. “Calibration of Stochastic Computer Simulators Using Likelihood Emulation.” Technometrics 59 (1): 80–92. https://doi.org/10.1080/00401706.2015.1125391.

O’Hagan, A. 1978. “Curve Fitting and Optimal Design for Prediction.” Journal of the Royal Statistical Society. Series B (Methodological) 40 (1): 1–42. http://www.jstor.org/stable/2984861.

———. 2006. “Bayesian Analysis of Computer Code Outputs: A Tutorial.” Reliability Engineering & System Safety, The Fourth International Conference on Sensitivity Analysis of Model Output (SAMO 2004), 91 (10): 1290–1300. https://doi.org/10.1016/j.ress.2005.11.025.

Paleyes, Andrei, Mark Pullin, Maren Mahsereci, Neil Lawrence, and Javier Gonzalez. 2019. “Emulation of Physical Processes with Emukit.” In Advances in Neural Information Processing Systems, 8. https://ml4physicalsciences.github.io/files/NeurIPS_ML4PS_2019_113.pdf.

Plumlee, Matthew. 2017. “Bayesian Calibration of Inexact Computer Models.” Journal of the American Statistical Association 112 (519): 1274–85. https://doi.org/10.1080/01621459.2016.1211016.

Razavi, Saman, Bryan A. Tolson, and Donald H. Burn. 2012. “Review of Surrogate Modeling in Water Resources.” Water Resources Research 48 (7). https://doi.org/10.1029/2011WR011527.

Rueden, Laura von, Sebastian Mayer, Rafet Sifa, Christian Bauckhage, and Jochen Garcke. 2020. “Combining Machine Learning and Simulation to a Hybrid Modelling Approach: Current and Future Directions.” In Advances in Intelligent Data Analysis XVIII, edited by Michael R. Berthold, Ad Feelders, and Georg Krempl, 12080:548–60. Lecture Notes in Computer Science. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-44584-3_43.

Siade, Adam J., Tao Cui, Robert N. Karelse, and Clive Hampton. 2020. “Reduced‐Dimensional Gaussian Process Machine Learning for Groundwater Allocation Planning Using Swarm Theory.” Water Resources Research 56 (3). https://doi.org/10.1029/2019WR026061.

Tait, Daniel J., and Theodoros Damoulas. 2020. “Variational Autoencoding of PDE Inverse Problems,” June. http://arxiv.org/abs/2006.15641.

Tompson, Jonathan, Kristofer Schlachter, Pablo Sprechmann, and Ken Perlin. 2017. “Accelerating Eulerian Fluid Simulation with Convolutional Networks.” In Proceedings of the 34th International Conference on Machine Learning - Volume 70, 3424–33. ICML’17. Sydney, NSW, Australia: JMLR.org. http://proceedings.mlr.press/v70/tompson17a.html.

Vernon, Ian, Michael Goldstein, and Richard Bower. 2014. “Galaxy Formation: Bayesian History Matching for the Observable Universe.” Statistical Science 29 (1): 81–90. https://doi.org/10.1214/12-STS412.

Yu, Xiayang, Tao Cui, J. Sreekanth, Stephane Mangeon, Rebecca Doble, Pei Xin, David Rassam, and Mat Gilfedder. 2020. “Deep Learning Emulators for Groundwater Contaminant Transport Modelling.” Journal of Hydrology, August, 125351. https://doi.org/10.1016/j.jhydrol.2020.125351.

Zhu, Yinhao, Nicholas Zabaras, Phaedon-Stelios Koutsourelakis, and Paris Perdikaris. 2019. “Physics-Constrained Deep Learning for High-Dimensional Surrogate Modeling and Uncertainty Quantification Without Labeled Data.” Journal of Computational Physics 394 (October): 56–81. https://doi.org/10.1016/j.jcp.2019.05.024.