Simulation-based inference

If I knew the right inputs to the simulator, could I get behaviour which matched my observations?

December 24, 2014 — March 3, 2024

approximation
Bayes
feature construction
likelihood free
machine learning
measure
metrics
probability
sciml
statistics
time series
Figure 1

This is chaos right now; I’m consolidating notebooks. Categories may not be well-posted.

Suppose we have access to a simulator of a system of interest, and if we knew the “right” inputs we could get behaviour from it which matched some observations we have made of a related phenomenon in the world. Supposed further that the simulator is pretty messy so we do not have access to the likelihood. Can we still do statistics, e.g. inferring the parameters of the simulator which would give rise to the observations we have made?

Oh my, what a variety of ways we can try.

There are various families of methods here; some try to work purely in samples; others try to approximate the likelihood. I am not sure how all the methods relate to one another. But let us mention some.

Cranmer, Brehmer, and Louppe (2020) attempt to develop a taxonomy Figure 2. They make likelihood-free methods sound useful for machine learning for physics.

Figure 2: Cranmer, Brehmer, and Louppe (2020)’s taxonomy of “simulation based” approaches. Interesting starting point. More stuff has happened since then.

1 Neural likelihood estimation

As summarised in Cranmer, Brehmer, and Louppe (2020). Incorporating

See the Mackelab sbi page for several implementations:

Goal: Algorithmically identify mechanistic models which are consistent with data.

Each of the methods above needs three inputs: A candidate mechanistic model, prior knowledge or constraints on model parameters, and observational data (or summary statistics thereof).

The methods then proceed by

  1. sampling parameters from the prior followed by simulating synthetic data from these parameters,
  2. learning the (probabilistic) association between data (or data features) and underlying parameters, i.e. to learn statistical inference from simulated data. The way in which this association is learned differs between the above methods, but all use deep neural networks.
  3. This learned neural network is then applied to empirical data to derive the full space of parameters consistent with the data and the prior, i.e. the posterior distribution. High posterior probability is assigned to parameters which are consistent with both the data and the prior, low probability to inconsistent parameters. While SNPE directly learns the posterior distribution, SNLE and SNRE need an extra MCMC sampling step to construct a posterior.
  4. If needed, an initial estimate of the posterior can be used to adaptively generate additional informative simulations.

Code here: mackelab/sbi: Simulation-based inference in PyTorch

Compare to contrastive learning.

2 Indirect inference

A.k.a the auxiliary method.

In the (older?) frequentist framing you can get through an undergraduate program in statistics without simulation based inference arising. However, I am pretty sure it is required for economists and ecologists.

Quoting Cosma:

[…] your model is too complicated for you to appeal to any of the usual estimation methods of statistics. […] there is no way to even calculate the likelihood of a given data set \(x_1,x_2,…x_t\equiv x_t\) under parameters \(\theta\) in closed form, which would rule out even numerical likelihood maximization, to say nothing of Bayesian methods […] Yet you can simulate; it seems like there should be some way of saying whether the simulations look like the data. This is where indirect inference comes in […] Introduce a new model, called the “auxiliary model”, which is mis-specified and typically not even generative, but is easily fit to the data, and to the data alone. (By that last I mean that you don’t have to impute values for latent variables, etc., etc., even though you might know those variables exist and are causally important.) The auxiliary model has its own parameter vector \(\beta\), with an estimator \(\hat{\beta}\). These parameters describe aspects of the distribution of observables, and the idea of indirect inference is that we can estimate the generative parameters \(\theta\) by trying to match those aspects of observations, by trying to match the auxiliary parameters.

Aaron King’s lab at UMichigan stamped its mark on a lot of this research. One wonders whether the optimal summary statistic can be learned from the data. Apparently yes?.

I gather the pomp R package does some simulation-based inference, but I have not checked in for a while so there might be broader and/or fresher options.

3 Scoring rules

See scoring rules (Gneiting and Raftery 2007; Pacchiardi and Dutta 2022). NB, these are calibration scores, not Fisher scores.

3.1 Energy distances

I thought I knew what this was but I think not. The fact there are so many grandiose publications here (Gneiting and Raftery 2007; Székely and Rizzo 2013, 2017) leads me to suspect there is more going on than the obvious? TBC.

3.2 MMD

A particularly convenient discrepancy to use for simulation-based problems is the MMD, because it can be evaluated without reference to a density. See Maximum Mean Discrepancy.

Figure 3: Finding the target without directly inspecting the likelihood of the current guess

4 Approximate Bayesian Computation

Slightly different take, which resembles the indirect inference approach. See Approximate Bayesian Computation.

5 Incoming

6 References

Akbayrak, Bocharov, and de Vries. 2021. Extended Variational Message Passing for Automated Approximate Bayesian Inference.” Entropy.
Aushev, Tran, Pesonen, et al. 2023. Likelihood-Free Inference in State-Space Models with Unknown Dynamics.”
Babtie, Kirk, and Stumpf. 2014. Topological Sensitivity Analysis for Systems Biology.” Proceedings of the National Academy of Sciences.
Batz, Ruttor, and Opper. 2017. Approximate Bayes Learning of Stochastic Differential Equations.” arXiv:1702.05390 [Physics, Stat].
Boelts, Lueckmann, Gao, et al. 2022. Flexible and Efficient Simulation-Based Inference for Models of Decision-Making.” Edited by Valentin Wyart, Timothy E Behrens, Luigi Acerbi, and Jean Daunizeau. eLife.
Brehmer, Louppe, Pavez, et al. 2020. Mining Gold from Implicit Models to Improve Likelihood-Free Inference.” Proceedings of the National Academy of Sciences.
Bretó, He, Ionides, et al. 2009. Time Series Analysis via Mechanistic Models.” The Annals of Applied Statistics.
Cauchemez, and Ferguson. 2008. Likelihood-Based Estimation of Continuous-Time Epidemic Models from Time-Series Data: Application to Measles Transmission in London.” Journal of The Royal Society Interface.
Clark, and Bjørnstad. 2004. Population Time Series: Process Variability, Observation Errors, Missing Values, Lags, and Hidden States.” Ecology.
Commandeur, Koopman, and Ooms. 2011. Statistical Software for State Space Methods.” Journal of Statistical Software.
Cook, Otten, Marion, et al. 2007. Estimation of Multiple Transmission Rates for Epidemics in Heterogeneous Populations.” Proceedings of the National Academy of Sciences.
Corenflos, Thornton, Deligiannidis, et al. 2021. Differentiable Particle Filtering via Entropy-Regularized Optimal Transport.” arXiv:2102.07850 [Cs, Stat].
Cox, and Kartsonaki. 2012. The Fitting of Complex Parametric Models.” Biometrika.
Cranmer, Brehmer, and Louppe. 2020. The Frontier of Simulation-Based Inference.” Proceedings of the National Academy of Sciences.
Creel, and Kristensen. 2012. Estimation of Dynamic Latent Variable Models Using Simulated Non-Parametric Moments.” The Econometrics Journal.
———. 2013. Indirect Likelihood Inference (Revised).” UFAE and IAE Working Paper 931.13.
Czellar, and Ronchetti. 2010. Accurate and Robust Tests for Indirect Inference.” Biometrika.
Dax, Wildberger, Buchholz, et al. 2023. Flow Matching for Scalable Simulation-Based Inference.”
de Castro, and Dorigo. 2019. INFERNO: Inference-Aware Neural Optimisation.” Computer Physics Communications.
Deistler, Goncalves, and Macke. 2022. Truncated Proposals for Scalable and Hassle-Free Simulation-Based Inference.”
Delaunoy, Hermans, Rozet, et al. 2022. Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation.”
Dellaporta, Knoblauch, Damoulas, et al. 2022. Robust Bayesian Inference for Simulator-Based Models via the MMD Posterior Bootstrap.” arXiv:2202.04744 [Cs, Stat].
Didelot, Everitt, Johansen, et al. 2011. Likelihood-Free Estimation of Model Evidence.” Bayesian Analysis.
Dridi, Guay, and Renault. 2007. Indirect Inference and Calibration of Dynamic Stochastic General Equilibrium Models.” Journal of Econometrics, The interface between econometrics and economic theory,.
Drovandi, and Frazier. 2021. A Comparison of Likelihood-Free Methods With and Without Summary Statistics.” arXiv:2103.02407 [Stat].
Durkan, Murray, and Papamakarios. 2020. On Contrastive Learning for Likelihood-Free Inference.” In Proceedings of the 37th International Conference on Machine Learning. ICML’20.
Durkan, Papamakarios, and Murray. 2018. Sequential Neural Methods for Likelihood-Free Inference.”
Efron. 2010. The Future of Indirect Evidence.” Statistical Science.
Fong, Lyddon, and Holmes. 2019. Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap.” arXiv:1902.03175 [Cs, Stat].
Forneron, and Ng. 2015. The ABC of Simulation Estimation with Auxiliary Statistics.” arXiv:1501.01265 [Stat].
Gallant, Hsieh, and Tauchen. 1997. Estimation of Stochastic Volatility Models with Diagnostics.” Journal of Econometrics.
Gallant, and Tauchen. 1996. Which Moments to Match? Econometric Theory.
Genton, and Ronchetti. 2003. Robust Indirect Inference.” Journal of the American Statistical Association.
Glöckler, Deistler, and Macke. 2022. Variational Methods for Simulation-Based Inference.”
Gneiting, and Raftery. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association.
Gonçalves, Lueckmann, Deistler, et al. 2020. Training Deep Neural Density Estimators to Identify Mechanistic Models of Neural Dynamics.” Edited by John R Huguenard, Timothy O’Leary, and Mark S Goldman. eLife.
Gourieroux, Christian, and Monfort. 1993. Simulation-Based Inference: A Survey with Special Reference to Panel Data Models.” Journal of Econometrics.
Gourieroux, C., Monfort, and Renault. 1993. Indirect Inference.” Journal of Applied Econometrics.
Grazian, and Fan. 2019. A Review of Approximate Bayesian Computation Methods via Density Estimation: Inference for Simulator-Models.”
Greenberg, Nonnenmacher, and Macke. 2019. Automatic Posterior Transformation for Likelihood-Free Inference.” In Proceedings of the 36th International Conference on Machine Learning.
Grelaud, Robert, Marin, et al. 2009. ABC Likelihood-Free Methods for Model Choice in Gibbs Random Fields.” Bayesian Analysis.
Gutmann, and Corander. 2016. “Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models.” The Journal of Machine Learning Research.
He, Ionides, and King. 2010. Plug-and-Play Inference for Disease Dynamics: Measles in Large and Small Populations as a Case Study.” Journal of The Royal Society Interface.
Hermans, Begy, and Louppe. 2020. Likelihood-Free MCMC with Amortized Approximate Ratio Estimators.” arXiv:1903.04057 [Cs, Stat].
Ionides, Edward L., Bhadra, Atchadé, et al. 2011. Iterated Filtering.” The Annals of Statistics.
Ionides, E. L., Bretó, and King. 2006. Inference for Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences.
Jiang, and Turnbull. 2004. The Indirect Method: Inference Based on Intermediate Statistics—A Synthesis and Examples.” Statistical Science.
Kendall, Ellner, McCauley, et al. 2005. Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses.” Ecological Monographs.
Lueckmann, Bassetto, Karaletsos, et al. 2019. Likelihood-Free Inference with Emulator Networks.” In Symposium on Advances in Approximate Bayesian Inference.
Lueckmann, Boelts, Greenberg, et al. 2021. Benchmarking Simulation-Based Inference.” In AISTATS.
Lueckmann, Gonçalves, Bassetto, et al. 2017. Flexible Statistical Inference for Mechanistic Models of Neural Dynamics.” In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17.
Lyddon, Walker, and Holmes. 2018. Nonparametric Learning from Bayesian Models with Randomized Objective Functions.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
Matsubara, Knoblauch, Briol, et al. 2021. Robust Generalised Bayesian Inference for Intractable Likelihoods.” arXiv:2104.07359 [Math, Stat].
Miller, Cole, and Louppe. n.d. “Simulation-Efficient Marginal Posterior Estimation with Swyft: Stop Wasting Your Precious Time.” In.
Miller, Weniger, and Forré. 2022. Contrastive Neural Ratio Estimation.” In.
Nickl, and Pötscher. 2009. Efficient Simulation-Based Minimum Distance Estimation and Indirect Inference.” Mathematical Methods of Statistics 19.
Nott, Drovandi, and Frazier. 2023. Bayesian Inference for Misspecified Generative Models.”
Nott, Marshall, and Ngoc. 2012. The Ensemble Kalman Filter Is an ABC Algorithm.” Statistics and Computing.
Pacchiardi, and Dutta. 2022. Generalized Bayesian Likelihood-Free Inference Using Scoring Rules Estimators.” arXiv:2104.03889 [Stat].
Papamakarios. 2019. Neural Density Estimation and Likelihood-Free Inference.”
Papamakarios, and Murray. 2016. Fast ε-Free Inference of Simulation Models with Bayesian Conditional Density Estimation.” In Advances in Neural Information Processing Systems 29.
Papamakarios, Nalisnick, Rezende, et al. 2021. Normalizing Flows for Probabilistic Modeling and Inference.” Journal of Machine Learning Research.
Papamakarios, Sterratt, and Murray. 2019. Sequential Neural Likelihood: Fast Likelihood-Free Inference with Autoregressive Flows.” In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics.
Roberts, and Stramer. 2001. On Inference for Partially Observed Nonlinear Diffusion Models Using the Metropolis–Hastings Algorithm.” Biometrika.
Schmon, Cannon, and Knoblauch. 2021. Generalized Posteriors in Approximate Bayesian Computation.” arXiv:2011.08644 [Stat].
Shalizi. 2021. A Note on Simulation-Based Inference by Matching Random Features.”
Shi, Sun, and Zhu. 2018. A Spectral Approach to Gradient Estimation for Implicit Distributions.” In.
Sisson, Fan, and Beaumont, eds. 2019. Handbook of Approximate Bayesian Computation.
Smith, A. A. 1993. Estimating Nonlinear Time-Series Models Using Simulated Vector Autoregressions.” Journal of Applied Econometrics.
Smith, A A. 2008. Indirect Inference.” In The New Palgrave Dictionary of Economics.
Stoye, Brehmer, Louppe, et al. 2018. Likelihood-Free Inference with an Improved Cross-Entropy Estimator.” arXiv:1808.00973 [Hep-Ph, Physics:physics, Stat].
Székely, and Rizzo. 2013. Energy Statistics: A Class of Statistics Based on Distances.” Journal of Statistical Planning and Inference.
———. 2017. The Energy of Data.” Annual Review of Statistics and Its Application.
Talts, Betancourt, Simpson, et al. 2020. Validating Bayesian Inference Algorithms with Simulation-Based Calibration.”
Wood. 2010. Statistical Inference for Noisy Nonlinear Ecological Dynamic Systems.” Nature.
Zhu, and Fan. 2022. A Synthetic Likelihood Approach for Intractable Markov Random Fields.” Computational Statistics.