Variational inference

On fitting the best model one can be bothered to

Approximating the intractable measure (right) with a transformation of a tractable one (left)

Inference where we approximate the density of the posterior variationally. That is, we use cunning tricks to turn solve an inference problem by optimising over some parameter set, usually one that allows us to trade off difficulty for fidelity in some useful way.

This idea is not intrinsically Bayesian (i.e. the density we are approximating need not be a posterior density or the marginal likelihood of the evidence), but much of the hot literature on it is from Bayesians doing something fashionable probabilistic deep learning, so for concreteness I will assume Bayesian uses here.

This is usually mentioned in contrast from the other main method of approximating such densities: sampling from them, usually using Markov Chain Monte Carlo. In practice the two are related (Salimans, Kingma, and Welling 2015) and nowadays even used together (Rezende and Mohamed 2015; Caterini, Doucet, and Sejdinovic 2018).

Once we have decided we are happy to use variational approximations, we are left with the question of โ€ฆ how? There are, AFAICT, two main schools of thought here - methods which leverage the graphical structure of the problem and maintain structural hygiene, which use variational message passing


The classic intro seems to be (Jordan et al. 1999), which considers diverse types of variational calculus applications and inference. Typical ML uses these days are more specific; an archetypal example would be the variational auto-encoder (Diederik P. Kingma and Welling 2014).

Inference via KL divergence

The most common version uses KL loss to construct the famous Evidence Lower Bound Objective. This is mathematically convenient and highly recommended if you can get away with it.

Other loss functions

In which probability metric should one approximate the target density? For tradition and convenience, we usually use KL-loss, but this is not ideal, and alternatives are hot topics. There are simple ones, such as โ€œreverse KLโ€, which is sometimes how we justify expectation propagation and also the modest generalisation to Rรฉnyi-divergence inference (Li and Turner 2016).

Ingmar Schusterโ€™s critique of black box loss (Ranganath et al. 2016) raises some issues :

Itโ€™s called Operator VI as a fancy way to say that one is flexible in constructing how exactly the objective function uses \(\pi, q\) and test functions from some family \(\mathcal{F}\). I completely agree with the motivation: KL-Divergence in the form \(\int q(x) \log \frac{q(x)}{\pi(x)} \mathrm{d}x\) indeed underestimates the variance of \(\pi\) and approximates only one mode. Using KL the other way around, \(\int \pi(x) \log \frac{pi(x)}{q(x)} \mathrm{d}x\) takes all modes into account, but still tends to underestimate variance.

[โ€ฆ] the authors suggest an objective using what they call the Langevin-Stein Operator which does not make use of the proposal density \(q\) at all but uses test functions exclusively.

Philosophical interpretations

John Schulmanโ€™s Sending Samples Without Bits-Back is a nifty interpretation of KL variational bounds in terms of coding theory/message sending.

Not grandiose enough? See Karl Fristonโ€™s interpretation of variational inference a principle of cognition.

Mean-field assumption

TODO: mention the importance of this for classic-flavoured variational inference (Mean Field Variational Bayes). This confused me of aaaaages. AFAICT this is a problem of history. Not all variational inference makes the confusingly-named โ€œmean-fieldโ€ assumption, but for a long while that that was the only game in town, so tutorials of a certain vintage take mean-field variational inference as a synonym for variational inference. If I have just learnt some non-mean-field SVI methods from a recent NeurIPS paper then I run into this I might well be confused.

Mixture models

Mixture models are classic and for ages, seemed to be the default choice for variational approximation. They are an interesting trick to make a graphical model conditionally conjugate by use of auxiliary variables.

Reparameterization trick

See reparameterisation.


See variational autoencoders?


Abbasnejad, Ehsan, Anthony Dick, and Anton van den Hengel. 2016. โ€œInfinite Variational Autoencoder for Semi-Supervised Learning.โ€ In Advances in Neural Information Processing Systems 29.
Archer, Evan, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. 2015. โ€œBlack Box Variational Inference for State Space Models.โ€ arXiv:1511.07367 [Stat], November.
Attias, Hagai. 1999. โ€œInferring Parameters and Structure of Latent Variable Models by Variational Bayes.โ€ In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, 21โ€“30. UAIโ€™99. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Bamler, Robert, and Stephan Mandt. 2017. โ€œStructured Black Box Variational Inference for Latent Time Series Models.โ€ arXiv:1707.01069 [Cs, Stat], July.
Berg, Rianne van den, Leonard Hasenclever, Jakub M. Tomczak, and Max Welling. 2018. โ€œSylvester Normalizing Flows for Variational Inference.โ€ In UAI18.
Bishop, Christopher. 1994. โ€œMixture Density Networks.โ€ Microsoft Research, January.
Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. 2017. โ€œVariational Inference: A Review for Statisticians.โ€ Journal of the American Statistical Association 112 (518): 859โ€“77.
Burt, David R., Carl Edward Rasmussen, and Mark van der Wilk. 2020. โ€œConvergence of Sparse Variational Inference in Gaussian Processes Regression.โ€ Journal of Machine Learning Research 21 (131): 1โ€“63.
Caterini, Anthony L., Arnaud Doucet, and Dino Sejdinovic. 2018. โ€œHamiltonian Variational Auto-Encoder.โ€ In Advances in Neural Information Processing Systems.
Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. โ€œNeural Ordinary Differential Equations.โ€ In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572โ€“83. Curran Associates, Inc.
Chung, Junyoung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C Courville, and Yoshua Bengio. 2015. โ€œA Recurrent Latent Variable Model for Sequential Data.โ€ In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2980โ€“88. Curran Associates, Inc.
Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. โ€œRandom Feature Expansions for Deep Gaussian Processes.โ€ In PMLR.
Dhaka, Akash Kumar, and Alejandro Catalina. 2020. โ€œRobust, Accurate Stochastic Optimization for Variational Inference,โ€ 13.
Dhaka, Akash Kumar, Alejandro Catalina, Manushi Welandawe, Michael Riis Andersen, Jonathan Huggins, and Aki Vehtari. 2021. โ€œChallenges and Opportunities in High-Dimensional Variational Inference.โ€ arXiv:2103.01085 [Cs, Stat], March.
Doerr, Andreas, Christian Daniel, Martin Schiegg, Duy Nguyen-Tuong, Stefan Schaal, Marc Toussaint, and Sebastian Trimpe. 2018. โ€œProbabilistic Recurrent State-Space Models.โ€ arXiv:1801.10395 [Stat], January.
Fabius, Otto, and Joost R. van Amersfoort. 2014. โ€œVariational Recurrent Auto-Encoders.โ€ In Proceedings of ICLR.
Flunkert, Valentin, David Salinas, and Jan Gasthaus. 2017. โ€œDeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.โ€ arXiv:1704.04110 [Cs, Stat], April.
Fortunato, Meire, Charles Blundell, and Oriol Vinyals. 2017. โ€œBayesian Recurrent Neural Networks.โ€ arXiv:1704.02798 [Cs, Stat], April.
Fraccaro, Marco, Sรธ ren Kaae Sรธ nderby, Ulrich Paquet, and Ole Winther. 2016. โ€œSequential Neural Models with Stochastic Layers.โ€ In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2199โ€“2207. Curran Associates, Inc.
Frey, B.J., and Nebojsa Jojic. 2005. โ€œA Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models.โ€ IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (9): 1392โ€“1416.
Futami, Futoshi, Issei Sato, and Masashi Sugiyama. 2017. โ€œVariational Inference Based on Robust Divergences.โ€ arXiv:1710.06595 [Stat], October.
Gagen, Michael J, and Kae Nemoto. 2006. โ€œVariational Optimization of Probability Measure Spaces Resolves the Chain Store Paradox.โ€
Gal, Yarin, and Mark van der Wilk. 2014. โ€œVariational Inference in Sparse Gaussian Process Regression and Latent Variable Models - a Gentle Tutorial.โ€ arXiv:1402.1412 [Stat], February.
Galy-Fajou, Thรฉo, Valerio Perrone, and Manfred Opper. 2021. โ€œFlexible and Efficient Inference with Particles for the Variational Gaussian Approximation.โ€ Entropy 23 (8): 990.
Giordano, Ryan, Tamara Broderick, and Michael I. Jordan. 2017. โ€œCovariances, Robustness, and Variational Bayes.โ€ arXiv:1709.02536 [Stat], September.
Grathwohl, Will, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. 2018. โ€œFFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models.โ€ arXiv:1810.01367 [Cs, Stat], October.
Graves, Alex. 2011. โ€œPractical Variational Inference for Neural Networks.โ€ In Proceedings of the 24th International Conference on Neural Information Processing Systems, 2348โ€“56. NIPSโ€™11. USA: Curran Associates Inc.
Gu, Shixiang, Zoubin Ghahramani, and Richard E Turner. 2015. โ€œNeural Adaptive Sequential Monte Carlo.โ€ In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2629โ€“37. Curran Associates, Inc.
Gulrajani, Ishaan, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. โ€œImproved Training of Wasserstein GANs.โ€ arXiv:1704.00028 [Cs, Stat], March.
He, Junxian, Daniel Spokoyny, Graham Neubig, and Taylor Berg-Kirkpatrick. 2019. โ€œLagging Inference Networks and Posterior Collapse in Variational Autoencoders.โ€ In PRoceedings of ICLR.
Hinton, G. E. 1995. โ€œThe Wake-Sleep Algorithm for Unsupervised Neural Networks.โ€ Science 268 (5214): 1558โ€“1161.
Hoffman, Matt, David M. Blei, Chong Wang, and John Paisley. 2013. โ€œStochastic Variational Inference.โ€ arXiv:1206.7051 [Cs, Stat] 14 (1).
Hoffman, Matthew, and David Blei. 2015. โ€œStochastic Structured Variational Inference.โ€ In PMLR, 361โ€“69.
Huang, Chin-Wei, David Krueger, Alexandre Lacoste, and Aaron Courville. 2018. โ€œNeural Autoregressive Flows.โ€ arXiv:1804.00779 [Cs, Stat], April.
Huggins, Jonathan H., Mikoล‚aj Kasprzak, Trevor Campbell, and Tamara Broderick. 2019. โ€œPractical Posterior Error Bounds from Variational Objectives.โ€ arXiv:1910.04102 [Cs, Math, Stat], October.
Jaakkola, Tommi S., and Michael I. Jordan. 1998. โ€œImproving the Mean Field Approximation Via the Use of Mixture Distributions.โ€ In Learning in Graphical Models, 163โ€“73. NATO ASI Series. Springer, Dordrecht.
Jordan, Michael I., Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. 1999. โ€œAn Introduction to Variational Methods for Graphical Models.โ€ Machine Learning 37 (2): 183โ€“233.
Karl, Maximilian, Maximilian Soelch, Justin Bayer, and Patrick van der Smagt. 2016. โ€œDeep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data.โ€ In Proceedings of ICLR.
Khan, Mohammad, and Wu Lin. 2017. โ€œConjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models.โ€ In Artificial Intelligence and Statistics, 878โ€“87. PMLR.
Kingma, Diederik P. 2017. โ€œVariational Inference & Deep Learning: A New Synthesis.โ€
Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. โ€œImproving Variational Inference with Inverse Autoregressive Flow.โ€ In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
Kingma, Diederik P., Tim Salimans, and Max Welling. 2015. โ€œVariational Dropout and the Local Reparameterization Trick.โ€ In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, 2575โ€“83. NIPSโ€™15. Cambridge, MA, USA: MIT Press.
Kingma, Diederik P., and Max Welling. 2014. โ€œAuto-Encoding Variational Bayes.โ€ In ICLR 2014 Conference.
Kingma, Durk P, and Prafulla Dhariwal. 2018. โ€œGlow: Generative Flow with Invertible 1x1 Convolutions.โ€ In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 10236โ€“45. Curran Associates, Inc.
Knoblauch, Jeremias, Jack Jewson, and Theodoros Damoulas. 2022. โ€œAn Optimization-Centric View on Bayesโ€™ Rule: Reviewing and Generalizing Variational Inference.โ€ Journal of Machine Learning Research 23 (132): 1โ€“109.
Larsen, Anders Boesen Lindbo, Sรธren Kaae Sรธnderby, Hugo Larochelle, and Ole Winther. 2015. โ€œAutoencoding Beyond Pixels Using a Learned Similarity Metric.โ€ arXiv:1512.09300 [Cs, Stat], December.
Leibfried, Felix, Vincent Dutordoir, S. T. John, and Nicolas Durrande. 2022. โ€œA Tutorial on Sparse Gaussian Processes and Variational Inference.โ€ arXiv.
Li, Yingzhen, and Richard E Turner. 2016. โ€œRรฉnyi Divergence Variational Inference.โ€ In Advances in Neural Information Processing Systems, 29:1081โ€“89. Red Hook, NY, USA: Curran Associates, Inc.
Liu, Huidong, Xianfeng Gu, and Dimitris Samaras. 2018. โ€œA Two-Step Computation of the Exact GAN Wasserstein Distance.โ€ In International Conference on Machine Learning, 3159โ€“68.
Liu, Qiang, and Dilin Wang. 2019. โ€œStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.โ€ In Advances In Neural Information Processing Systems.
Louizos, Christos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. โ€œCausal Effect Inference with Deep Latent-Variable Models.โ€ In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 6446โ€“56. Curran Associates, Inc.
Louizos, Christos, and Max Welling. 2016. โ€œStructured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors.โ€ In arXiv Preprint arXiv:1603.04733, 1708โ€“16.
โ€”โ€”โ€”. 2017. โ€œMultiplicative Normalizing Flows for Variational Bayesian Neural Networks.โ€ In PMLR, 2218โ€“27.
Luts, Jan. 2015. โ€œReal-Time Semiparametric Regression for Distributed Data Sets.โ€ IEEE Transactions on Knowledge and Data Engineering 27 (2): 545โ€“57.
Luts, J., T. Broderick, and M. P. Wand. 2014. โ€œReal-Time Semiparametric Regression.โ€ Journal of Computational and Graphical Statistics 23 (3): 589โ€“615.
MacKay, David J C. 2002a. โ€œGaussian Processes.โ€ In Information Theory, Inference & Learning Algorithms, Chapter 45. Cambridge University Press.
โ€”โ€”โ€”. 2002b. Information Theory, Inference & Learning Algorithms. Cambridge University Press.
Maddison, Chris J., Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, and Yee Whye Teh. 2017. โ€œFiltering Variational Objectives.โ€ arXiv Preprint arXiv:1705.09279.
Mahdian, Saied, Jose Blanchet, and Peter Glynn. 2019. โ€œOptimal Transport Relaxations with Application to Wasserstein GANs.โ€ arXiv:1906.03317 [Cs, Math, Stat], June.
Marzouk, Youssef, Tarek Moselhy, Matthew Parno, and Alessio Spantini. 2016. โ€œSampling via Measure Transport: An Introduction.โ€ In Handbook of Uncertainty Quantification, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1:1โ€“41. Cham: Springer Heidelberg.
Matthews, Alexander Graeme de Garis. 2017. โ€œScalable Gaussian Process Inference Using Variational Methods.โ€ Thesis, University of Cambridge.
Meent, Jan-Willem van de, Brooks Paige, Hongseok Yang, and Frank Wood. 2021. โ€œAn Introduction to Probabilistic Programming.โ€ arXiv:1809.10756 [Cs, Stat], October.
Minka, Thomas P. 2001. โ€œExpectation Propagation for Approximate Bayesian Inference.โ€ In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, 362โ€“69. UAIโ€™01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. 2017. โ€œVariational Dropout Sparsifies Deep Neural Networks.โ€ In Proceedings of ICML.
Ng, Ignavier, Shengyu Zhu, Zhitang Chen, and Zhuangyan Fang. 2019. โ€œA Graph Autoencoder Approach to Causal Structure Learning.โ€ In Advances In Neural Information Processing Systems.
Nolan, Tui H., Marianne Menictas, and Matt P. Wand. 2020. โ€œStreamlined Variational Inference with Higher Level Random Effects.โ€ Journal of Machine Learning Research 21 (157): 1โ€“62.
Ormerod, J. T., and M. P. Wand. 2010. โ€œExplaining Variational Approximations.โ€ The American Statistician 64 (2): 140โ€“53.
Papamakarios, George, Iain Murray, and Theo Pavlakou. 2017. โ€œMasked Autoregressive Flow for Density Estimation.โ€ In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 2338โ€“47. Curran Associates, Inc.
Pereyra, M., P. Schniter, ร‰ Chouzenoux, J. C. Pesquet, J. Y. Tourneret, A. O. Hero, and S. McLaughlin. 2016. โ€œA Survey of Stochastic Simulation and Optimization Methods in Signal Processing.โ€ IEEE Journal of Selected Topics in Signal Processing 10 (2): 224โ€“41.
Plรถtz, Tobias, Anne S. Wannenwetsch, and Stefan Roth. 2018. โ€œStochastic Variational Inference with Gradient Linearization.โ€ In CVPR.
Ranganath, Rajesh, Dustin Tran, Jaan Altosaar, and David Blei. 2016. โ€œOperator Variational Inference.โ€ In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 496โ€“504. Curran Associates, Inc.
Ranganath, Rajesh, Dustin Tran, and David Blei. 2016. โ€œHierarchical Variational Models.โ€ In PMLR, 324โ€“33.
Rezende, Danilo Jimenez, and Shakir Mohamed. 2015. โ€œVariational Inference with Normalizing Flows.โ€ In International Conference on Machine Learning, 1530โ€“38. ICMLโ€™15. Lille, France:
Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. 2015. โ€œStochastic Backpropagation and Approximate Inference in Deep Generative Models.โ€ In Proceedings of ICML.
Roychowdhury, Anirban, and Brian Kulis. 2015. โ€œGamma Processes, Stick-Breaking, and Variational Inference.โ€ In Artificial Intelligence and Statistics, 800โ€“808. PMLR.
Ruiz, Francisco J. R., Michalis K. Titsias, and David M. Blei. 2016. โ€œThe Generalized Reparameterization Gradient.โ€ In Advances In Neural Information Processing Systems.
Ryder, Thomas, Andrew Golightly, A. Stephen McGough, and Dennis Prangle. 2018. โ€œBlack-Box Variational Inference for Stochastic Differential Equations.โ€ arXiv:1802.03335 [Stat], February.
Salimans, Tim, Diederik Kingma, and Max Welling. 2015. โ€œMarkov Chain Monte Carlo and Variational Inference: Bridging the Gap.โ€ In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 1218โ€“26. ICMLโ€™15. Lille, France:
Schervish, Mark J. 2012. Theory of Statistics. Springer Series in Statistics. New York, NY: Springer Science & Business Media.
Spantini, Alessio, Daniele Bigoni, and Youssef Marzouk. 2017. โ€œInference via Low-Dimensional Couplings.โ€ Journal of Machine Learning Research 19 (66): 2639โ€“709.
Staines, Joe, and David Barber. 2012. โ€œVariational Optimization.โ€ arXiv:1212.4507 [Cs, Stat], December.
Titsias, Michalis K., and Miguel Lรกzaro-Gredilla. 2014. โ€œDoubly Stochastic Variational Bayes for Non-Conjugate Inference.โ€ In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, II-1971โ€“II-1980. ICMLโ€™14. Beijing, China:
Ullrich, K. 2020. โ€œA Coding Perspective on Deep Latent Variable Models.โ€
Wainwright, Martin J., and Michael I. Jordan. 2008. Graphical Models, Exponential Families, and Variational Inference. Vol. 1. Foundations and Trendsยฎ in Machine Learning. Now Publishers.
Wainwright, Martin, and Michael I Jordan. 2005. โ€œA Variational Principle for Graphical Models.โ€ In New Directions in Statistical Signal Processing. Vol. 155. MIT Press.
Wand, M. P. 2017. โ€œFast Approximate Inference for Arbitrarily Large Semiparametric Regression Models via Message Passing.โ€ Journal of the American Statistical Association 112 (517): 137โ€“68.
Wang, Yixin, and David M. Blei. 2017. โ€œFrequentist Consistency of Variational Bayes.โ€ arXiv:1705.03439 [Cs, Math, Stat], May.
Wiegerinck, Wim. 2000. โ€œVariational Approximations Between Mean Field Theory and the Junction Tree Algorithm.โ€ In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, 626โ€“33. UAI โ€™00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Wingate, David, and Theophane Weber. 2013. โ€œAutomated Variational Inference in Probabilistic Programming.โ€ arXiv:1301.1299 [Cs, Stat], January.
Winn, John M., and Christopher M. Bishop. 2005. โ€œVariational Message Passing.โ€ In Journal of Machine Learning Research, 661โ€“94.
Xing, Eric P., Michael I. Jordan, and Stuart Russell. 2003. โ€œA Generalized Mean Field Algorithm for Variational Inference in Exponential Families.โ€ In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, 583โ€“91. UAIโ€™03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. n.d. โ€œYes, but Did It Work?: Evaluating Variational Inference,โ€ 18.
Yoshida, Ryo, and Mike West. 2010. โ€œBayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.โ€ Journal of Machine Learning Research 11 (May): 1771โ€“98.
Zahm, Olivier, Paul Constantine, Clรฉmentine Prieur, and Youssef Marzouk. 2018. โ€œGradient-Based Dimension Reduction of Multivariate Vector-Valued Functions.โ€ arXiv:1801.07922 [Math], January.
Zhang, Yufeng, Wanwei Liu, Zhenbang Chen, Ji Wang, and Kenli Li. 2022. โ€œOn the Properties of Kullback-Leibler Divergence Between Multivariate Gaussian Distributions.โ€ arXiv.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.