# Learning summary statistics

April 22, 2020 — July 15, 2021

A dimensionality reduction/feature engineering trick specific to the needs of likelihood-free inference methods such as indirect inference or approximate Bayes computation. In these contexts, it is not just the summary statistic in isolation to be considered but its relationship to a distance measure between this summary statistic for the observations and the model simulation. We would like both of these to be tractable in combination. A limiting case of learnable coarse graining?

TBD. See de Castro and Dorigo (2019):

Simulator-based inference is currently at the core of many scientific fields, such as population genetics, epidemiology, and experimental particle physics. In many cases, the implicit generative procedure defined in the simulation is stochastic and/or lacks a tractable probability density \(p(x|\theta)\), where \(\theta \in \Theta\) is the vector of model parameters. Given some experimental observations \(D = \{x_0, \dots, x_n \},\) a problem of special relevance for these disciplines is statistical inference on a subset of model parameters \(\omega \in \Omega \subseteq \Theta.\) This can be approached via likelihood-free inference algorithms such as Approximate Bayesian Computation (ABC), simplified synthetic likelihoods, or density estimation-by-comparison approaches. Because the relation between the parameters of the model and the data is only available via forward simulation, most likelihood-free inference algorithms tend to be computationally expensive due to the need for repeated simulations to cover the parameter space. When data are high-dimensional, likelihood-free inference can rapidly become inefficient, so low-dimensional summary statistics \(s(D)\) are used instead of the raw data for tractability. The choice of summary statistics for such cases becomes critical, given that naive choices might cause loss of relevant information and a corresponding degradation of the power of resulting statistical inference. As a motivating example, we consider data analyses at the Large Hadron Collider (LHC), such as those carried out to establish the discovery of the Higgs boson. In that framework, the ultimate aim is to extract information about Nature from the large amounts of high-dimensional data on the subatomic particles produced by energetic collision of protons, and acquired by highly complex detectors built around the collision point. Accurate data modelling is only available via stochastic simulation of a complicated chain of physical processes, from the underlying fundamental interaction to the subsequent particle interactions with the detector elements and their readout. As a result, the density \(p(x|\theta)\) cannot be analytically computed.

There is a very different approach in Edwards and Storkey (2017).

An efficient learner is one who reuses what they already know to tackle a new problem. For a machine learner, this means understanding the similarities amongst datasets. In order to do this, one must take seriously the idea of working with datasets, rather than datapoints, as the key objects to model. Towards this goal, we demonstrate an extension of a variational autoencoder that can learn a method for computing representations, or statistics, of datasets in an unsupervised fashion. The network is trained to produce statistics that encapsulate a generative model for each dataset. Hence the network enables efficient learning from new datasets for both unsupervised and supervised tasks. We show that we are able to learn statistics that can be used for: clustering datasets, transferring generative models to new datasets, selecting representative samples of datasets and classifying previously unseen classes. We refer to our model as a

neural statistician, and by this we mean a neural network that can learn to compute summary statistics of datasets without supervision.

I wonder if this neural statistician solves any problem to the aforementioned goal of simulation-based inferences?

## 1 Incoming

Can we make neural statistics via deep sets? Minimising the Expected Posterior Entropy Yields Optimal Summary Statistics (Hoffmann and Onnela 2023).

## 2 References

*Genetics*.

*arXiv:1507.04553 [Stat]*.

*Nature Computational Science*.

*Computer Physics Communications*.

*Journal of the Royal Statistical Society: Series C (Applied Statistics)*.

*Bayesian Analysis*.

*Proceedings of ICLR*.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*.

*Journal of the American Statistical Association*.

*arXiv:1503.03167 [Cs]*.

*Statistical Applications in Genetics and Molecular Biology*.

*Sankhya B*.

*Handbook of Approximate Bayesian Computation*.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*.

*Statistica Sinica*.