Inferring densities and distribution in a massively parameterised deep learning setting.

This is not intrinsically a Bayesian thing to do but in practice much of the demand to do probabilistic nets comes from the demand for Bayesian posterior inference for neural nets, and accordingly most of the action is over there. Bayesian inference is of course not the only way to do uncertainty quantification.

## Mixture density networks

## References

Abbasnejad, Ehsan, Anthony Dick, and Anton van den Hengel. 2016. “Infinite Variational Autoencoder for Semi-Supervised Learning.” In

*Advances in Neural Information Processing Systems 29*. http://arxiv.org/abs/1611.07800.
Archer, Evan, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. 2015. “Black Box Variational Inference for State Space Models.” November 23, 2015. http://arxiv.org/abs/1511.07367.

Baydin, Atılım Güneş, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, et al. 2019. “Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale.” In. http://arxiv.org/abs/1907.03382.

Bazzani, Loris, Lorenzo Torresani, and Hugo Larochelle. 2017. “Recurrent Mixture Density Network for Spatiotemporal Visual Attention,” 15.

Bishop, Christopher. 1994. “Mixture Density Networks.”

*Microsoft Research*, January. https://www.microsoft.com/en-us/research/publication/mixture-density-networks/.
Bora, Ashish, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. 2017. “Compressed Sensing Using Generative Models.” In

*International Conference on Machine Learning*, 537–46. http://arxiv.org/abs/1703.03208.
Bui, Thang D., Sujith Ravi, and Vivek Ramavajjala. 2017. “Neural Graph Machines: Learning Neural Networks Using Graphs.” March 14, 2017. http://arxiv.org/abs/1703.04818.

Castro, Pablo de, and Tommaso Dorigo. 2019. “INFERNO: Inference-Aware Neural Optimisation.”

*Computer Physics Communications*244 (November): 170–79. https://doi.org/10.1016/j.cpc.2019.06.007.
Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. “Neural Ordinary Differential Equations.” In

*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc. http://papers.nips.cc/paper/7892-neural-ordinary-differential-equations.pdf.
Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. “Random Feature Expansions for Deep Gaussian Processes.” In

*PMLR*. http://proceedings.mlr.press/v70/cutajar17a.html.
Damianou, Andreas, and Neil Lawrence. 2013. “Deep Gaussian Processes.” In

*Artificial Intelligence and Statistics*, 207–15. PMLR. http://proceedings.mlr.press/v31/damianou13a.html.
Doerr, Andreas, Christian Daniel, Martin Schiegg, Duy Nguyen-Tuong, Stefan Schaal, Marc Toussaint, and Sebastian Trimpe. 2018. “Probabilistic Recurrent State-Space Models.” January 31, 2018. http://arxiv.org/abs/1801.10395.

Domingos, Pedro. 2020. “Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.” November 30, 2020. http://arxiv.org/abs/2012.00152.

Dunlop, Matthew M., Mark A. Girolami, Andrew M. Stuart, and Aretha L. Teckentrup. 2018. “How Deep Are Deep Gaussian Processes?”

*Journal of Machine Learning Research*19 (1): 2100–2145. http://jmlr.org/papers/v19/18-015.html.
Dupont, Emilien, Arnaud Doucet, and Yee Whye Teh. 2019. “Augmented Neural ODEs.” April 2, 2019. http://arxiv.org/abs/1904.01681.

Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017. “Identification of Gaussian Process State Space Models.” In

*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc. http://papers.nips.cc/paper/7115-identification-of-gaussian-process-state-space-models.pdf.
Fabius, Otto, and Joost R. van Amersfoort. 2014. “Variational Recurrent Auto-Encoders.” In

*Proceedings of ICLR*. http://arxiv.org/abs/1412.6581.
Flunkert, Valentin, David Salinas, and Jan Gasthaus. 2017. “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” April 13, 2017. http://arxiv.org/abs/1704.04110.

Gal, Yarin. 2015. “Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference.” In

*Advances in Approximate Bayesian Inference Workshop, NIPS*.
———. 2016. “Uncertainty in Deep Learning.” University of Cambridge.

Gal, Yarin, and Zoubin Ghahramani. 2015a. “On Modern Deep Learning and Variational Inference.” In

*Advances in Approximate Bayesian Inference Workshop, NIPS*.
———. 2015b. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In

*Proceedings of the 33rd International Conference on Machine Learning (ICML-16)*. http://arxiv.org/abs/1506.02142.
———. 2016a. “A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In. http://arxiv.org/abs/1512.05287.

———. 2016b. “Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference.” In

*4th International Conference on Learning Representations (ICLR) Workshop Track*. http://arxiv.org/abs/1506.02158.
———. 2016c. “Dropout as a Bayesian Approximation: Appendix.” May 25, 2016. http://arxiv.org/abs/1506.02157.

Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018. “Conditional Neural Processes.” July 4, 2018. https://arxiv.org/abs/1807.01613v1.

Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018. “Neural Processes,” July. https://arxiv.org/abs/1807.01622v1.

Gholami, Amir, Kurt Keutzer, and George Biros. 2019. “ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs.” February 26, 2019. http://arxiv.org/abs/1902.10298.

Graves, Alex. 2011. “Practical Variational Inference for Neural Networks.” In

*Proceedings of the 24th International Conference on Neural Information Processing Systems*, 2348–56. NIPS’11. USA: Curran Associates Inc. https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks.pdf.
———. 2013. “Generating Sequences With Recurrent Neural Networks.” August 4, 2013. http://arxiv.org/abs/1308.0850.

Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. “Speech Recognition with Deep Recurrent Neural Networks.” In

*2013 IEEE International Conference on Acoustics, Speech and Signal Processing*. https://doi.org/10.1109/ICASSP.2013.6638947.
Gregor, Karol, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. “DRAW: A Recurrent Neural Network For Image Generation.” February 16, 2015. http://arxiv.org/abs/1502.04623.

Gu, Shixiang, Zoubin Ghahramani, and Richard E Turner. 2015. “Neural Adaptive Sequential Monte Carlo.” In

*Advances in Neural Information Processing Systems 28*, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2629–37. Curran Associates, Inc. http://papers.nips.cc/paper/5961-neural-adaptive-sequential-monte-carlo.pdf.
Gu, Shixiang, Sergey Levine, Ilya Sutskever, and Andriy Mnih. 2016. “MuProp: Unbiased Backpropagation for Stochastic Neural Networks.” In

*Proceedings of ICLR*. https://arxiv.org/abs/1511.05176v3.
Hoffman, Matthew, and David Blei. 2015. “Stochastic Structured Variational Inference.” In

*PMLR*, 361–69. http://proceedings.mlr.press/v38/hoffman15.html.
Karl, Maximilian, Maximilian Soelch, Justin Bayer, and Patrick van der Smagt. 2016. “Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data.” In

*Proceedings of ICLR*. http://arxiv.org/abs/1605.06432.
Kingma, Diederik P. 2017. “Variational Inference & Deep Learning: A New Synthesis.” https://www.dropbox.com/s/v6ua3d9yt44vgb3/cover_and_thesis.pdf?dl=0.

Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. “Improving Variational Inference with Inverse Autoregressive Flow.” In

*Advances in Neural Information Processing Systems 29*. Curran Associates, Inc. http://arxiv.org/abs/1606.04934.
Kingma, Diederik P., and Max Welling. 2014. “Auto-Encoding Variational Bayes.” In

*ICLR 2014 Conference*. http://arxiv.org/abs/1312.6114.
Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016. “AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In

*Uai17*. http://arxiv.org/abs/1610.05392.
Krishnan, Rahul G., Uri Shalit, and David Sontag. 2015. “Deep Kalman Filters.” 2015. https://arxiv.org/abs/1511.05121.

Larsen, Anders Boesen Lindbo, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2015. “Autoencoding Beyond Pixels Using a Learned Similarity Metric.” December 31, 2015. http://arxiv.org/abs/1512.09300.

Le, Tuan Anh, Atılım Güneş Baydin, and Frank Wood. 2017. “Inference Compilation and Universal Probabilistic Programming.” In

*Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS)*, 54:1338–48. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR. http://arxiv.org/abs/1610.09900.
Le, Tuan Anh, Maximilian Igl, Tom Jin, Tom Rainforth, and Frank Wood. 2017. “Auto-Encoding Sequential Monte Carlo.” 2017. https://arxiv.org/abs/1705.10306.

Lobacheva, Ekaterina, Nadezhda Chirkova, and Dmitry Vetrov. 2017. “Bayesian Sparsification of Recurrent Neural Networks.” In

*Workshop on Learning to Generate Natural Language*. http://arxiv.org/abs/1708.00077.
Louizos, Christos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. “Causal Effect Inference with Deep Latent-Variable Models.” In

*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 6446–56. Curran Associates, Inc. http://papers.nips.cc/paper/7223-causal-effect-inference-with-deep-latent-variable-models.pdf.
Louizos, Christos, and Max Welling. 2016. “Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors.” In, 1708–16. http://arxiv.org/abs/1603.04733.

———. 2017. “Multiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In

*PMLR*, 2218–27. http://proceedings.mlr.press/v70/louizos17a.html.
MacKay, David J C. 2002.

*Information Theory, Inference & Learning Algorithms*. Cambridge University Press.
Maddison, Chris J., Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, and Yee Whye Teh. 2017. “Filtering Variational Objectives.” 2017. https://arxiv.org/abs/1705.09279.

Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017. “Stochastic Gradient Descent as Approximate Bayesian Inference.”

*JMLR*, April. http://arxiv.org/abs/1704.04289.
Matthews, Alexander Graeme de Garis, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2016. “GPflow: A Gaussian Process Library Using TensorFlow.” October 27, 2016. http://arxiv.org/abs/1610.08733.

Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. 2017. “Variational Dropout Sparsifies Deep Neural Networks.” In

*Proceedings of ICML*. http://arxiv.org/abs/1701.05369.
Neal, Radford M. 1996. “Bayesian Learning for Neural Networks.” Secaucus, NJ, USA: Springer-Verlag New York, Inc. http://www.csri.utoronto.ca/ radford/ftp/thesis.pdf.

Ngiam, Jiquan, Zhenghao Chen, Pang W. Koh, and Andrew Y. Ng. 2011. “Learning Deep Energy Models.” In

*Proceedings of the 28th International Conference on Machine Learning (ICML-11)*, 1105–12. http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Ngiam_557.pdf.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006.

*Gaussian Processes for Machine Learning*. Adaptive Computation and Machine Learning. Cambridge, Mass: Max-Planck-Gesellschaft; MIT Press. http://www.gaussianprocess.org/gpml/.
Ryder, Thomas, Andrew Golightly, A. Stephen McGough, and Dennis Prangle. 2018. “Black-Box Variational Inference for Stochastic Differential Equations.” February 9, 2018. http://arxiv.org/abs/1802.03335.

Tran, Dustin, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. “Deep Probabilistic Programming.” In

*ICLR*. http://arxiv.org/abs/1701.03757.
Tran, Dustin, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. “Edward: A Library for Probabilistic Modeling, Inference, and Criticism.” October 31, 2016. http://arxiv.org/abs/1610.09787.

Wainwright, Martin, and Michael I Jordan. 2005. “A Variational Principle for Graphical Models.” In

*New Directions in Statistical Signal Processing*. Vol. 155. MIT Press.
Wen, Yeming, Dustin Tran, and Jimmy Ba. 2020. “BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning.” In

*ICLR*. http://arxiv.org/abs/2002.06715.
Yang, Yunfei, Zhen Li, and Yang Wang. 2021. “On the Capacity of Deep Generative Networks for Approximating Distributions.” January 28, 2021. http://arxiv.org/abs/2101.12353.

Zeevi, Assaf J., and Ronny Meir. 1997. “Density Estimation Through Convex Combinations of Densities: Approximation and Estimation Bounds.”

*Neural Networks: The Official Journal of the International Neural Network Society*10 (1): 99–109. https://doi.org/10.1016/S0893-6080(96)00037-8.
## No comments yet. Why not leave one?