Summary statistics which donβt require us to keep all the data but which allow us to nonetheless do inference nearly as well.
e.g sufficient statistics in exponential families allow you to do certain kind of inference perfectly without anything except summaries.
Methods such as
variational Bayes
summarize data by maintaining a posterior density
as a summary of all the data likelihood, at some cost in accuracy.
I think of these as nearly sufficient statistics
but we could thinkg of these
*data summarization* which I note here for later reference.

- Approximate Bayesian Computation
*inducing sets*, as seen in Gaussian processes*coresets*as seen in Bayesian models- probabilistic deep learning possibly does this
- Bounded Memory Learning considers this from a computational complexity standpoint β is that realted?

TBC.

## Coresets

Bayesian. Solve an optimisation problem to minimise distance between posterior with all data and with a weighted subset.

## Representative subsets

I think this is intended to be generic? See apricot:

apricot implements submodular optimization for the purpose of summarizing massive data sets into minimally redundant subsets that are still representative of the original data. These subsets are useful for both visualizing the modalities in the data (such as in the two data sets below) and for training accurate machine learning models with just a fraction of the examples and compute.

## References

*International Conference on Machine Learning*, 89β98.

*International Conference on Machine Learning*, 209β17.

*arXiv Preprint arXiv:1703.06476*.

*Advances in Neural Information Processing Systems 26*, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 1727β35. Curran Associates, Inc.

*arXiv:1710.05053 [Cs, Stat]*, October.

*International Conference on Machine Learning*, 698β706.

*Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2514β22. Curran Associates, Inc.

*Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence*, 282β90. UAIβ13. Arlington, Virginia, USA: AUAI Press.

*arXiv:1605.06423 [Cs, Stat]*, May.

*arXiv:1806.10234 [Cs, Stat]*, June.

*arXiv:1809.09505 [Cs, Math, Stat]*, September.

*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 3611β21. Curran Associates, Inc.

*International Conference on Artificial Intelligence and Statistics*, 567β74. PMLR.

## No comments yet. Why not leave one?