Learning summary statistics


A dimensionality reduction/feature engineering trick for likelihood-free inference methods such as indirect inference or approximate Bayes computation.

TBD. See de Castro and Dorigo (2019):

Simulator-based inference is currently at the core of many scientific fields, such as population genetics, epidemiology, and experimental particle physics. In many cases the implicit generative procedure defined in the simulation is stochastic and/or lacks a tractable probability density p(x|θ), where θ ∈ Θ is the vector of model parameters. Given some experimental observations D = {x 0 , ..., x n }, a problem of special relevance for these disciplines is statistical inference on a subset of model parameters ω ∈ Ω ⊆ Θ. This can be approached via likelihood-free inference algorithms such as Approximate Bayesian Computation (ABC) [1], simplified synthetic likelihoods [2] or density estimation-by-comparison approaches [3]. Because the relation between the parameters of the model and the data is only available via forward simulation, most likelihood-free inference algorithms tend to be computationally expensive due to the need of repeated simulations to cover the parameter space. When data are high-dimensional, likelihood-free inference can rapidly become inefficient, so low-dimensional summary statistics s(D) are used instead of the raw data for tractability. The choice of summary statistics for such cases becomes critical, given that naive choices might cause loss of relevant information and a corresponding degradation of the power of resulting statistical inference. As a motivating example we consider data analyses at the Large Hadron Collider (LHC), such as those carried out to establish the discovery of the Higgs boson [4, 5]. In that framework, the ultimate aim is to extract information about Nature from the large amounts of high-dimensional data on the subatomic particles produced by energetic collision of protons, and acquired by highly complex detectors built around the collision point. Accurate data modelling is only available via stochastic simulation of a complicated chain of physical processes, from the underlying fundamental interaction to the subsequent particle interactions with the detector elements and their readout. As a result, the density p(x|θ) cannot be analytically computed.

There is a very different approach in Edwards and Storkey (2017). I suspect this is an active area.

Bertl, Johanna, Gregory Ewing, Carolin Kosiol, and Andreas Futschik. 2015. “Approximate Maximum Likelihood Estimation.” July 16, 2015. http://arxiv.org/abs/1507.04553.

Castro, Pablo de, and Tommaso Dorigo. 2019. “INFERNO: Inference-Aware Neural Optimisation.” Computer Physics Communications 244 (November): 170–79. https://doi.org/10.1016/j.cpc.2019.06.007.

Drovandi, Christopher C., Anthony N. Pettitt, and Roy A. McCutchan. 2016. “Exact and Approximate Bayesian Inference for Low Integer-Valued Time Series Models with Intractable Likelihoods.” Bayesian Analysis 11 (2): 325–52. https://doi.org/10.1214/15-BA950.

Edwards, Harrison, and Amos Storkey. 2017. “Towards a Neural Statistician.” In Proceedings of ICLR. https://arxiv.org/abs/1606.02185v2.

Hahn, P. Richard, and Carlos M. Carvalho. 2015. “Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective.” Journal of the American Statistical Association 110 (509): 435–48. https://doi.org/10.1080/01621459.2014.993077.

Kulkarni, Tejas D., Will Whitney, Pushmeet Kohli, and Joshua B. Tenenbaum. 2015. “Deep Convolutional Inverse Graphics Network.” March 11, 2015. http://arxiv.org/abs/1503.03167.

Stein, Michael L., Zhiyi Chi, and Leah J. Welty. 2004. “Approximating Likelihoods for Large Spatial Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66 (2): 275–96. https://doi.org/10.1046/j.1369-7412.2003.05512.x.