# Inference without KL divergence

October 3, 2019 — April 28, 2022

Placeholder. Various links on inference by minimising some other divergence than the Kullback Leibler divergence.

As mentioned in likelihood-free inference, this is especially interesting in the case of Bayesian inference, or more generally, distributional inference, complications ensue.

(Chu, Blanchet, and Glynn 2019):

in many fields, the object of interest is a probability distribution; moreover, the learning process is guided by a probability functional to be minimized, a loss function that conceptually maps a probability distribution to a real number […] Because the optimization now takes place in the infinite- dimensional space of probability measures, standard finite-dimensional algorithms like gradient descent are initially unavailable; even the proper notion for the derivative of these functionals is unclear. We call upon on a body of literature known as von Mises calculus, originally developed in the field of asymptotic statistics, to make these functional derivatives precise. Remarkably, we find that once the connection is made, the resulting generalized descent algorithm, which we call probability functional descent, is intimately compatible with standard deep learning techniques such as stochastic gradient descent, the reparameterization trick, and adversarial training.

## 1 Generalized Bayesian computation

## 2 References

*Biometrika*.

*Proceedings of the 32Nd International Conference on Neural Information Processing Systems*. NIPS’18.

*International Conference on Machine Learning*.

*IEEE Transactions on Information Theory*.

*The Annals of Statistics*.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*.

*arXiv:1610.05627 [Math, Stat]*.

*arXiv:1705.07152 [Stat]*.

*arXiv:1810.02403 [Math]*.

*arXiv:2004.07052 [Physics, q-Bio, Stat]*.

*arXiv:1710.05053 [Cs, Stat]*.

*arXiv:1902.00640 [Cs, Stat]*.

*Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference*.

*ICML*.

*arXiv:2202.04744 [Cs, Stat]*.

*von Mises calculus for statistical functionals*. Lecture Notes in Statistics 19.

*Wiley StatsRef: Statistics Reference Online*.

*arXiv:1902.03175 [Cs, Stat]*.

*Advances in Neural Information Processing Systems 28*.

*International Statistical Review*.

*arXiv:1704.00028 [Cs, Stat]*.

*arXiv:1705.07164 [Cs, Stat]*.

*Journal of Machine Learning Research*.

*International Conference on Machine Learning*.

*Proceedings of The 33rd International Conference on Machine Learning*.

*Proceedings of the 32nd International Conference on Neural Information Processing Systems*. NIPS’18.

*arXiv:1906.03317 [Cs, Math, Stat]*.

*Journal of the Royal Statistical Society Series B: Statistical Methodology*.

*arXiv:2008.09165 [Cs, Math, Stat]*.

*arXiv:2104.03889 [Stat]*.

*Annual Review of Statistics and Its Application*.

*Advances in Neural Information Processing Systems 29*.

*Stat*.

*Optimal Transport for Applied Mathematicians*. Edited by Filippo Santambrogio. Progress in Nonlinear Differential Equations and Their Applications.

*arXiv:2011.08644 [Stat]*.

*ACM Transactions on Graphics*.

*Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)*.

*Proceedings of NeurIPS 2020*.