# Simulation-based inference

If I knew the right inputs to the simulator, could I get behaviour which matched my observations?

December 24, 2014 — March 3, 2024

This is chaos right now; I’m consolidating notebooks. Categories may not be well-posted.

Suppose we have access to a simulator of a system of interest, and if we knew the “right” inputs we could get behaviour from it which matched some observations we have made of a related phenomenon in the world. Supposed further that the simulator is pretty messy so we do not have access to the likelihood. Can we still do statistics, e.g. inferring the parameters of the simulator which would give rise to the observations we have made?

Oh my, what a variety of ways we can try.

There are various families of methods here; some try to work purely in samples; others try to approximate the likelihood. I am not sure how all the methods relate to one another. But let us mention some.

Cranmer, Brehmer, and Louppe (2020) attempt to develop a taxonomy Figure 2. They make likelihood-free methods sound useful for machine learning for physics.

## 1 Neural likelihood methods

As summarised in Cranmer, Brehmer, and Louppe (2020); see Neural likelihood inference See the Mackelab sbi page for several implementations particularly targeting simulation-based inference.

Compare to contrastive learning.

## 2 Indirect inference

A.k.a the *auxiliary method*.

In the (older?) frequentist framing you can get through an undergraduate program in statistics without simulation based inference arising. However, I am pretty sure it is required for economists and ecologists.

Quoting Cosma:

[…] your model is too complicated for you to appeal to any of the usual estimation methods of statistics. […] there is no way to even calculate the likelihood of a given data set \(x_1,x_2,…x_t\equiv x_t\) under parameters \(\theta\) in closed form, which would rule out even numerical likelihood maximization, to say nothing of Bayesian methods […] Yet you can simulate; it seems like there should be some way of saying whether the simulations look like the data. This is where indirect inference comes in […] Introduce a new model, called the “auxiliary model”, which is mis-specified and typically not even generative, but is easily fit to the data, and to the data alone. (By that last I mean that you don’t have to impute values for latent variables, etc., etc., even though you might know those variables exist and are causally important.) The auxiliary model has its own parameter vector \(\beta\), with an estimator \(\hat{\beta}\). These parameters describe aspects of the distribution of observables, and the idea of indirect inference is that we can estimate the generative parameters \(\theta\) by trying to match those aspects of observations, by trying to match the auxiliary parameters.

Aaron King’s lab at UMichigan stamped its mark on a lot of this research. One wonders whether the optimal summary statistic can be learned from the data. Apparently yes?.

I gather the pomp R package does some simulation-based inference, but I have not checked in for a while so there might be broader and/or fresher options.

## 3 Scoring rules

See scoring rules (Gneiting and Raftery 2007; Pacchiardi and Dutta 2022). NB, these are calibration scores, not Fisher scores.

### 3.1 Energy distances

I thought I knew what this was but I think not. The fact there are so many grandiose publications here (Gneiting and Raftery 2007; Székely and Rizzo 2013, 2017) leads me to suspect there is more going on than the obvious? TBC.

### 3.2 MMD

A particularly convenient discrepancy to use for simulation-based problems is the MMD, because it can be evaluated without reference to a density. See Maximum Mean Discrepancy.

## 4 Approximate Bayesian Computation

Slightly different take, which resembles the indirect inference approach. See Approximate Bayesian Computation.

## 5 Incoming

## 6 References

*Entropy*.

*Proceedings of the National Academy of Sciences*.

*arXiv:1702.05390 [Physics, Stat]*.

*eLife*.

*Proceedings of the National Academy of Sciences*.

*The Annals of Applied Statistics*.

*Journal of The Royal Society Interface*.

*Ecology*.

*Journal of Statistical Software*.

*Proceedings of the National Academy of Sciences*.

*arXiv:2102.07850 [Cs, Stat]*.

*Biometrika*.

*Proceedings of the National Academy of Sciences*.

*The Econometrics Journal*.

*Biometrika*.

*Computer Physics Communications*.

*arXiv:2202.04744 [Cs, Stat]*.

*Bayesian Analysis*.

*Journal of Econometrics*, The interface between econometrics and economic theory,.

*arXiv:2103.02407 [Stat]*.

*Proceedings of the 37th International Conference on Machine Learning*. ICML’20.

*Statistical Science*.

*arXiv:1902.03175 [Cs, Stat]*.

*arXiv:1501.01265 [Stat]*.

*Journal of Econometrics*.

*Econometric Theory*.

*Journal of the American Statistical Association*.

*Journal of the American Statistical Association*.

*eLife*.

*Journal of Econometrics*.

*Journal of Applied Econometrics*.

*Proceedings of the 36th International Conference on Machine Learning*.

*Bayesian Analysis*.

*The Journal of Machine Learning Research*.

*Journal of The Royal Society Interface*.

*arXiv:1903.04057 [Cs, Stat]*.

*The Annals of Statistics*.

*Proceedings of the National Academy of Sciences*.

*Statistical Science*.

*Ecological Monographs*.

*Symposium on Advances in Approximate Bayesian Inference*.

*AISTATS*.

*Proceedings of the 31st International Conference on Neural Information Processing Systems*. NIPS’17.

*Proceedings of the 32nd International Conference on Neural Information Processing Systems*. NIPS’18.

*Journal of the Royal Statistical Society Series B: Statistical Methodology*.

*Mathematical Methods of Statistics 19*.

*Statistics and Computing*.

*arXiv:2104.03889 [Stat]*.

*Advances in Neural Information Processing Systems 29*.

*Journal of Machine Learning Research*.

*Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics*.

*Biometrika*.

*arXiv:2011.08644 [Stat]*.

*Handbook of Approximate Bayesian Computation*.

*Journal of Applied Econometrics*.

*The New Palgrave Dictionary of Economics*.

*arXiv:1808.00973 [Hep-Ph, Physics:physics, Stat]*.

*Journal of Statistical Planning and Inference*.

*Annual Review of Statistics and Its Application*.

*Nature*.

*Computational Statistics*.