Learning summary statistics

2020-04-22 — 2021-07-15

Wherein dataset-level summary statistics for likelihood-free inference are considered, and their joint tractability with simulation-to‑observation distance measures is examined, with neural‑statistician and deep‑sets constructions noted.

feature construction

functional analysis

linear algebra

machine learning

networks

neural nets

probability

sciml

sparser than thou

statistics

topology

A dimensionality reduction/feature engineering trick specific to the needs of likelihood-free inference methods such as indirect inference or approximate Bayes computation. In these contexts, it is not just the summary statistic in isolation to be considered but its relationship to a distance measure between this summary statistic for the observations and the model simulation. We would like both of these to be tractable in combination. A limiting case of learnable coarse graining?

TBD. See de Castro and Dorigo (2019):

Simulator-based inference is currently at the core of many scientific fields, such as population genetics, epidemiology, and experimental particle physics. In many cases, the implicit generative procedure defined in the simulation is stochastic and/or lacks a tractable probability density \(p(x|\theta)\), where \(\theta \in \Theta\) is the vector of model parameters. Given some experimental observations \(D = \{x_0, \dots, x_n \},\) a problem of special relevance for these disciplines is statistical inference on a subset of model parameters \(\omega \in \Omega \subseteq \Theta.\) This can be approached via likelihood-free inference algorithms such as Approximate Bayesian Computation (ABC), simplified synthetic likelihoods, or density estimation-by-comparison approaches. Because the relation between the parameters of the model and the data is only available via forward simulation, most likelihood-free inference algorithms tend to be computationally expensive due to the need for repeated simulations to cover the parameter space. When data are high-dimensional, likelihood-free inference can rapidly become inefficient, so low-dimensional summary statistics \(s(D)\) are used instead of the raw data for tractability. The choice of summary statistics for such cases becomes critical, given that naive choices might cause loss of relevant information and a corresponding degradation of the power of resulting statistical inference. As a motivating example, we consider data analyses at the Large Hadron Collider (LHC), such as those carried out to establish the discovery of the Higgs boson. In that framework, the ultimate aim is to extract information about Nature from the large amounts of high-dimensional data on the subatomic particles produced by energetic collision of protons, and acquired by highly complex detectors built around the collision point. Accurate data modelling is only available via stochastic simulation of a complicated chain of physical processes, from the underlying fundamental interaction to the subsequent particle interactions with the detector elements and their readout. As a result, the density \(p(x|\theta)\) cannot be analytically computed.

There is a very different approach in Edwards and Storkey (2017).

An efficient learner is one who reuses what they already know to tackle a new problem. For a machine learner, this means understanding the similarities amongst datasets. In order to do this, one must take seriously the idea of working with datasets, rather than datapoints, as the key objects to model. Towards this goal, we demonstrate an extension of a variational autoencoder that can learn a method for computing representations, or statistics, of datasets in an unsupervised fashion. The network is trained to produce statistics that encapsulate a generative model for each dataset. Hence the network enables efficient learning from new datasets for both unsupervised and supervised tasks. We show that we are able to learn statistics that can be used for: clustering datasets, transferring generative models to new datasets, selecting representative samples of datasets and classifying previously unseen classes. We refer to our model as a neural statistician, and by this we mean a neural network that can learn to compute summary statistics of datasets without supervision.

I wonder if this neural statistician solves any problem to the aforementioned goal of simulation-based inferences?

1 Incoming

Can we make neural statistics via deep sets? Minimising the Expected Posterior Entropy Yields Optimal Summary Statistics (Hoffmann and Onnela 2023).

2 References

Aeschbacher, Beaumont, and Futschik. 2012. “A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation.” Genetics.

Åkesson, Singh, Wrede, et al. 2020. “Convolutional Neural Networks as Summary Statistics for Approximate Bayesian Computation.”

Bertl, Ewing, Kosiol, et al. 2015. “Approximate Maximum Likelihood Estimation.” arXiv:1507.04553 [Stat].

Chen, Huang, Raghupathi, et al. 2022. “Automated Discovery of Fundamental Variables Hidden in Experimental Data.” Nature Computational Science.

de Castro, and Dorigo. 2019. “INFERNO: Inference-Aware Neural Optimisation.” Computer Physics Communications.

Drovandi, Pettitt, and Faddy. 2011. “Approximate Bayesian Computation Using Indirect Inference.” Journal of the Royal Statistical Society: Series C (Applied Statistics).

Drovandi, Pettitt, and McCutchan. 2016. “Exact and Approximate Bayesian Inference for Low Integer-Valued Time Series Models with Intractable Likelihoods.” Bayesian Analysis.

Edwards, and Storkey. 2017. “Towards a Neural Statistician.” In Proceedings of ICLR.

Fearnhead, and Prangle. 2012. “Constructing Summary Statistics for Approximate Bayesian Computation: Semi-Automatic Approximate Bayesian Computation.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Hahn, and Carvalho. 2015. “Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective.” Journal of the American Statistical Association.

Hoffmann, and Onnela. 2023. “Minimising the Expected Posterior Entropy Yields Optimal Summary Statistics.”

Kulkarni, Whitney, Kohli, et al. 2015. “Deep Convolutional Inverse Graphics Network.” arXiv:1503.03167 [Cs].

Nunes, and Balding. 2010. “On Optimal Selection of Summary Statistics for Approximate Bayesian Computation.” Statistical Applications in Genetics and Molecular Biology.

Pacchiardi, Künzli, Schöngens, et al. 2021. “Distance-Learning For Approximate Bayesian Computation To Model a Volcanic Eruption.” Sankhya B.

Prangle. 2015. “Summary Statistics in Approximate Bayesian Computation.”

Sisson, Fan, and Beaumont. 2018. Handbook of Approximate Bayesian Computation.

Stein, Chi, and Welty. 2004. “Approximating Likelihoods for Large Spatial Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Wong, Jiang, Wu, et al. 2018. “Learning Summary Statistic for Approximate Bayesian Computation via Deep Neural Network.” Statistica Sinica.