Inference where we approximate the density of the posterior variationally. That is, we use cunning tricks to turn solve an inference problem by optimising over some parameter set, usually one that allows us to trade off difficulty for fidelity in some useful way.

This idea is not intrinsically Bayesian (i.e. the density we are approximating need not be a posterior density or the marginal likelihood of the evidence), but much of the hot literature on it is from Bayesians doing something fashionable probabilistic deep learning, so for concreteness I will assume Bayesian uses here.

This is usually mentioned in contrast from the other main method of approximating such densities: sampling from them, usually using Markov Chain Monte Carlo. In practice the two are related (Salimans, Kingma, and Welling 2015) and nowadays even used together (Rezende and Mohamed 2015; Caterini, Doucet, and Sejdinovic 2018).

Once we have decided we are happy to use variational approximations, we are left with the question of โฆ how? There are, AFAICT, two main schools of thought here - methods which leverage the graphical structure of the problem and maintain structural hygiene, which use variational message passing

## Introduction

The classic intro seems to be (Jordan et al. 1999), which considers diverse types of variational calculus applications and inference. Typical ML uses these days are more specific; an archetypal example would be the variational auto-encoder (Diederik P. Kingma and Welling 2014).

## Inference via KL divergence

The most common version uses KL loss to construct the famous Evidence Lower Bound Objective. This is mathematically convenient and highly recommended if you can get away with it.

## Other loss functions

In which probability metric should one approximate the target density? For tradition and convenience, we usually use KL-loss, but this is not ideal, and alternatives are hot topics. There are simple ones, such as โreverse KLโ, which is sometimes how we justify expectation propagation and also the modest generalisation to Rรฉnyi-divergence inference (Li and Turner 2016).

Ingmar Schusterโs critique of black box loss (Ranganath et al. 2016) raises some issues :

Itโs called Operator VI as a fancy way to say that one is flexible in constructing how exactly the objective function uses \(\pi, q\) and test functions from some family \(\mathcal{F}\). I completely agree with the motivation: KL-Divergence in the form \(\int q(x) \log \frac{q(x)}{\pi(x)} \mathrm{d}x\) indeed underestimates the variance of \(\pi\) and approximates only one mode. Using KL the other way around, \(\int \pi(x) \log \frac{pi(x)}{q(x)} \mathrm{d}x\) takes all modes into account, but still tends to underestimate variance.

[โฆ] the authors suggest an objective using what they call the Langevin-Stein Operator which does not make use of the proposal density \(q\) at all but uses test functions exclusively.

## Philosophical interpretations

John Schulmanโs Sending Samples Without Bits-Back is a nifty interpretation of KL variational bounds in terms of coding theory/message sending.

Not grandiose enough? See Karl Fristonโs interpretation of variational inference a principle of cognition.

## In graphical models

## Mean-field assumption

TODO: mention the importance of this for classic-flavoured variational inference (Mean Field Variational Bayes). This confused me of aaaaages. AFAICT this is a problem of history. Not all variational inference makes the confusingly-named โmean-fieldโ assumption, but for a long while that that was the only game in town, so tutorials of a certain vintage take mean-field variational inference as a synonym for variational inference. If I have just learnt some non-mean-field SVI methods from a recent NeurIPS paper then I run into this I might well be confused.

## Mixture models

Mixture models are classic and for ages, seemed to be the default choice for variational approximation. They are an interesting trick to make a graphical model conditionally conjugate by use of auxiliary variables.

## Reparameterization trick

See reparameterisation.

## Autoencoders

## Stochastic

## References

*Advances in Neural Information Processing Systems 29*.

*arXiv:1511.07367 [Stat]*, November.

*Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence*, 21โ30. UAIโ99. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

*arXiv:1707.01069 [Cs, Stat]*, July.

*UAI18*.

*Microsoft Research*, January.

*Journal of the American Statistical Association*112 (518): 859โ77.

*Journal of Machine Learning Research*21 (131): 1โ63.

*Advances in Neural Information Processing Systems*.

*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572โ83. Curran Associates, Inc.

*Advances in Neural Information Processing Systems 28*, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2980โ88. Curran Associates, Inc.

*PMLR*.

*arXiv:2103.01085 [Cs, Stat]*, March.

*arXiv:1801.10395 [Stat]*, January.

*Proceedings of ICLR*.

*arXiv:1704.04110 [Cs, Stat]*, April.

*arXiv:1704.02798 [Cs, Stat]*, April.

*Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2199โ2207. Curran Associates, Inc.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*27 (9): 1392โ1416.

*arXiv:1710.06595 [Stat]*, October.

*arXiv:1402.1412 [Stat]*, February.

*Entropy*23 (8): 990.

*arXiv:1709.02536 [Stat]*, September.

*arXiv:1810.01367 [Cs, Stat]*, October.

*Proceedings of the 24th International Conference on Neural Information Processing Systems*, 2348โ56. NIPSโ11. USA: Curran Associates Inc.

*Advances in Neural Information Processing Systems 28*, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2629โ37. Curran Associates, Inc.

*arXiv:1704.00028 [Cs, Stat]*, March.

*PRoceedings of ICLR*.

*Science*268 (5214): 1558โ1161.

*arXiv:1206.7051 [Cs, Stat]*14 (1).

*PMLR*, 361โ69.

*arXiv:1804.00779 [Cs, Stat]*, April.

*arXiv:1910.04102 [Cs, Math, Stat]*, October.

*Learning in Graphical Models*, 163โ73. NATO ASI Series. Springer, Dordrecht.

*Machine Learning*37 (2): 183โ233.

*Proceedings of ICLR*.

*Artificial Intelligence and Statistics*, 878โ87. PMLR.

*Advances in Neural Information Processing Systems 29*. Curran Associates, Inc.

*Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2*, 2575โ83. NIPSโ15. Cambridge, MA, USA: MIT Press.

*ICLR 2014 Conference*.

*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 10236โ45. Curran Associates, Inc.

*arXiv:1512.09300 [Cs, Stat]*, December.

*arXiv:2012.13962 [Cs, Stat]*, June.

*Advances in Neural Information Processing Systems*, 29:1081โ89. Red Hook, NY, USA: Curran Associates, Inc.

*International Conference on Machine Learning*, 3159โ68.

*Advances In Neural Information Processing Systems*.

*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 6446โ56. Curran Associates, Inc.

*arXiv Preprint arXiv:1603.04733*, 1708โ16.

*PMLR*, 2218โ27.

*IEEE Transactions on Knowledge and Data Engineering*27 (2): 545โ57.

*Journal of Computational and Graphical Statistics*23 (3): 589โ615.

*Information Theory, Inference & Learning Algorithms*, Chapter 45. Cambridge University Press.

*Information Theory, Inference & Learning Algorithms*. Cambridge University Press.

*arXiv Preprint arXiv:1705.09279*.

*arXiv:1906.03317 [Cs, Math, Stat]*, June.

*Handbook of Uncertainty Quantification*, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1โ41. Cham: Springer International Publishing.

*arXiv:1809.10756 [Cs, Stat]*, October.

*Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence*, 362โ69. UAIโ01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

*Proceedings of ICML*.

*Advances In Neural Information Processing Systems*.

*Journal of Machine Learning Research*21 (157): 1โ62.

*The American Statistician*64 (2): 140โ53.

*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 2338โ47. Curran Associates, Inc.

*IEEE Journal of Selected Topics in Signal Processing*10 (2): 224โ41.

*CVPR*.

*Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 496โ504. Curran Associates, Inc.

*PMLR*, 324โ33.

*International Conference on Machine Learning*, 1530โ38. ICMLโ15. Lille, France: JMLR.org.

*Proceedings of ICML*.

*Artificial Intelligence and Statistics*, 800โ808. PMLR.

*Advances In Neural Information Processing Systems*.

*arXiv:1802.03335 [Stat]*, February.

*Proceedings of the 32nd International Conference on Machine Learning (ICML-15)*, 1218โ26. ICMLโ15. Lille, France: JMLR.org.

*Theory of Statistics*. Springer Series in Statistics. New York, NY: Springer Science & Business Media.

*Journal of Machine Learning Research*19 (66): 2639โ709.

*arXiv:1212.4507 [Cs, Stat]*, December.

*Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32*, II-1971โII-1980. ICMLโ14. Beijing, China: JMLR.org.

*Graphical Models, Exponential Families, and Variational Inference*. Vol. 1. Foundations and Trendsยฎ in Machine Learning. Now Publishers.

*New Directions in Statistical Signal Processing*. Vol. 155. MIT Press.

*Journal of the American Statistical Association*112 (517): 137โ68.

*arXiv:1705.03439 [Cs, Math, Stat]*, May.

*Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence*, 626โ33. UAI โ00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

*arXiv:1301.1299 [Cs, Stat]*, January.

*Journal of Machine Learning Research*, 661โ94.

*Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence*, 583โ91. UAIโ03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

*Journal of Machine Learning Research*11 (May): 1771โ98.

*arXiv:1801.07922 [Math]*, January.

## No comments yet. Why not leave one?