# Simulation-based inference

If I knew the right inputs to the simulator, could I get behaviour which matched my observations?

December 24, 2014 — March 3, 2024

approximation
Bayes
feature construction
likelihood free
machine learning
measure
metrics
probability
sciml
statistics
time series

This is chaos right now; I’m consolidating notebooks. Categories may not be well-posted.

Suppose we have access to a simulator of a system of interest, and if we knew the “right” inputs we could get behaviour from it which matched some observations we have made of a related phenomenon in the world. Supposed further that the simulator is pretty messy so we do not have access to the likelihood. Can we still do statistics, e.g. inferring the parameters of the simulator which would give rise to the observations we have made?

Oh my, what a variety of ways we can try.

There are various families of methods here; some try to work purely in samples; others try to approximate the likelihood. I am not sure how all the methods relate to one another. But let us mention some.

Cranmer, Brehmer, and Louppe (2020) attempt to develop a taxonomy Figure 2. They make likelihood-free methods sound useful for machine learning for physics.

## 1 Neural likelihood methods

As summarised in Cranmer, Brehmer, and Louppe (2020); see Neural likelihood inference See the Mackelab sbi page for several implementations particularly targeting simulation-based inference.

Compare to contrastive learning.

## 2 Indirect inference

A.k.a the auxiliary method.

In the (older?) frequentist framing you can get through an undergraduate program in statistics without simulation based inference arising. However, I am pretty sure it is required for economists and ecologists.

Quoting Cosma:

[…] your model is too complicated for you to appeal to any of the usual estimation methods of statistics. […] there is no way to even calculate the likelihood of a given data set $$x_1,x_2,…x_t\equiv x_t$$ under parameters $$\theta$$ in closed form, which would rule out even numerical likelihood maximization, to say nothing of Bayesian methods […] Yet you can simulate; it seems like there should be some way of saying whether the simulations look like the data. This is where indirect inference comes in […] Introduce a new model, called the “auxiliary model”, which is mis-specified and typically not even generative, but is easily fit to the data, and to the data alone. (By that last I mean that you don’t have to impute values for latent variables, etc., etc., even though you might know those variables exist and are causally important.) The auxiliary model has its own parameter vector $$\beta$$, with an estimator $$\hat{\beta}$$. These parameters describe aspects of the distribution of observables, and the idea of indirect inference is that we can estimate the generative parameters $$\theta$$ by trying to match those aspects of observations, by trying to match the auxiliary parameters.

Aaron King’s lab at UMichigan stamped its mark on a lot of this research. One wonders whether the optimal summary statistic can be learned from the data. Apparently yes?.

I gather the pomp R package does some simulation-based inference, but I have not checked in for a while so there might be broader and/or fresher options.

## 3 Scoring rules

See scoring rules . NB, these are calibration scores, not Fisher scores.

### 3.1 Energy distances

I thought I knew what this was but I think not. The fact there are so many grandiose publications here leads me to suspect there is more going on than the obvious? TBC.

### 3.2 MMD

A particularly convenient discrepancy to use for simulation-based problems is the MMD, because it can be evaluated without reference to a density. See Maximum Mean Discrepancy.

## 4 Approximate Bayesian Computation

Slightly different take, which resembles the indirect inference approach. See Approximate Bayesian Computation.

## 6 References

Akbayrak, Bocharov, and de Vries. 2021. Entropy.
Aushev, Tran, Pesonen, et al. 2023.
Babtie, Kirk, and Stumpf. 2014. Proceedings of the National Academy of Sciences.
Batz, Ruttor, and Opper. 2017. arXiv:1702.05390 [Physics, Stat].
Boelts, Lueckmann, Gao, et al. 2022. Edited by Valentin Wyart, Timothy E Behrens, Luigi Acerbi, and Jean Daunizeau. eLife.
Brehmer, Louppe, Pavez, et al. 2020. Proceedings of the National Academy of Sciences.
Bretó, He, Ionides, et al. 2009. The Annals of Applied Statistics.
Cauchemez, and Ferguson. 2008. Journal of The Royal Society Interface.
Commandeur, Koopman, and Ooms. 2011. Journal of Statistical Software.
Cook, Otten, Marion, et al. 2007. Proceedings of the National Academy of Sciences.
Corenflos, Thornton, Deligiannidis, et al. 2021. arXiv:2102.07850 [Cs, Stat].
Cox, and Kartsonaki. 2012. Biometrika.
Cranmer, Brehmer, and Louppe. 2020. Proceedings of the National Academy of Sciences.
Creel, and Kristensen. 2012. The Econometrics Journal.
———. 2013. UFAE and IAE Working Paper 931.13.
Czellar, and Ronchetti. 2010. Biometrika.
Dax, Wildberger, Buchholz, et al. 2023.
de Castro, and Dorigo. 2019. Computer Physics Communications.
Deistler, Goncalves, and Macke. 2022.
Delaunoy, Hermans, Rozet, et al. 2022.
Dellaporta, Knoblauch, Damoulas, et al. 2022. arXiv:2202.04744 [Cs, Stat].
Didelot, Everitt, Johansen, et al. 2011. Bayesian Analysis.
Dridi, Guay, and Renault. 2007. Journal of Econometrics, The interface between econometrics and economic theory,.
Drovandi, and Frazier. 2021. arXiv:2103.02407 [Stat].
Durkan, Murray, and Papamakarios. 2020. In Proceedings of the 37th International Conference on Machine Learning. ICML’20.
Durkan, Papamakarios, and Murray. 2018.
Efron. 2010. Statistical Science.
Fong, Lyddon, and Holmes. 2019. arXiv:1902.03175 [Cs, Stat].
Forneron, and Ng. 2015. arXiv:1501.01265 [Stat].
Gallant, and Tauchen. 1996. Econometric Theory.
———. 1997. Macroeconomic Dynamics.
Genton, and Ronchetti. 2003. Journal of the American Statistical Association.
Glöckler, Deistler, and Macke. 2022.
Gneiting, and Raftery. 2007. Journal of the American Statistical Association.
Gonçalves, Lueckmann, Deistler, et al. 2020. Edited by John R Huguenard, Timothy O’Leary, and Mark S Goldman. eLife.
Gourieroux, Christian, and Monfort. 1993. Journal of Econometrics.
Gourieroux, C., Monfort, and Renault. 1993. Journal of Applied Econometrics.
Greenberg, Nonnenmacher, and Macke. 2019. In Proceedings of the 36th International Conference on Machine Learning.
Grelaud, Robert, Marin, et al. 2009. Bayesian Analysis.
Gutmann, and Corander. 2016. “Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models.” The Journal of Machine Learning Research.
He, Ionides, and King. 2010. Journal of The Royal Society Interface.
Hermans, Begy, and Louppe. 2020. arXiv:1903.04057 [Cs, Stat].
Ionides, Edward L., Bhadra, Atchadé, et al. 2011. The Annals of Statistics.
Ionides, E. L., Bretó, and King. 2006. Proceedings of the National Academy of Sciences.
Jiang, and Turnbull. 2004. Statistical Science.
Kendall, Ellner, McCauley, et al. 2005. Ecological Monographs.
Lueckmann, Bassetto, Karaletsos, et al. 2019. In Symposium on Advances in Approximate Bayesian Inference.
Lueckmann, Boelts, Greenberg, et al. 2021. In AISTATS.
Lueckmann, Gonçalves, Bassetto, et al. 2017. In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17.
Lyddon, Walker, and Holmes. 2018. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
Matsubara, Knoblauch, Briol, et al. 2022. Journal of the Royal Statistical Society Series B: Statistical Methodology.
Miller, Cole, Louppe, et al. 2020. In.
Miller, Weniger, and Forré. 2022. In.
Nickl, and Pötscher. 2009. Mathematical Methods of Statistics 19.
Nott, Drovandi, and Frazier. 2023.
Nott, Marshall, and Ngoc. 2012. Statistics and Computing.
Pacchiardi, and Dutta. 2022. arXiv:2104.03889 [Stat].
Papamakarios. 2019.
Papamakarios, and Murray. 2016. In Advances in Neural Information Processing Systems 29.
Papamakarios, Nalisnick, Rezende, et al. 2021. Journal of Machine Learning Research.
Papamakarios, Sterratt, and Murray. 2019. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics.
Roberts, and Stramer. 2001. Biometrika.
Schmon, Cannon, and Knoblauch. 2021. arXiv:2011.08644 [Stat].
Shalizi. 2021.
Shi, Sun, and Zhu. 2018. In.
Sisson, Fan, and Beaumont, eds. 2019. Handbook of Approximate Bayesian Computation.
Smith, A. A. 1993. Journal of Applied Econometrics.
Smith, A A. 2008. In The New Palgrave Dictionary of Economics.
Stoye, Brehmer, Louppe, et al. 2018. arXiv:1808.00973 [Hep-Ph, Physics:physics, Stat].
Székely, and Rizzo. 2013. Journal of Statistical Planning and Inference.
———. 2017. Annual Review of Statistics and Its Application.
Talts, Betancourt, Simpson, et al. 2020.
Wood. 2010. Nature.
Zhu, and Fan. 2022. Computational Statistics.