**NB:** this is not current; I am doing too much research in the area to summarise it well, and it is large area.

I would summarize this as creating neural networks which infer whole probability densities rather than point predictions. Sometimes this term seems to be used to mean finding some manner of Bayesian justification for a nerual network.

AFAICT this usually boils down to doing variational inference, in which case the neural network is a big approximate probabilistic directed graphical model. Apparently you can also do simulation-based inference here, somehow using gradients? Must look into that. Also, Gaussian Processes can be made to fit into this framing.

To learn:

- how does this work outside of KL-divergence?
- marginal likelihood in model selection: how does it work with many optima?

## Backgrounders

Radford Neal’s thesis ((Neal 1996)] is a foundational asymptotically-Bayesian use of neural netwroks. Yarin Gal’s PhD Thesis (Gal 2016) summarizes some implicit approximate approaches (e.g. the Bayesian interpretation of dropout). Diederik P. Kingma’s thesis is the latest blockbuster in this tradition.

Alex Graves did a poster of his paper (Graves 2011) of a simplest prior uncertainty thing for recurrent nets - (diagonal Gaussian weight uncertainty) There is a 3rd party quick and dirty implementation.

One could refer to the 2019 NeurIPS Bayes deep learning workshop site which will have some more modern positioning.

One of the very popular method here is the variational autoencoder and affiliated reparameterization trick which I have recently sumarized for my own interest.

## Reparameterisation

## Autoencoders

See autoencoders.

## Practicalities

Tensorflow probability defines an ecosystem of probabilistic NN united by their terrible documentation (although they have many tutorials online). Blei Lab’s software tool, Edward (source) seems to be included in that tensorflow suite and it has good documentaiton.

There are is better documened but possibly less comprehensive probabilistc deep learning support for pytorch in the pyro library.

Thomas Wiecki, Bayesian Deep Learning shows how to some variants with PyMC3.

Abbasnejad, Ehsan, Anthony Dick, and Anton van den Hengel. 2016. “Infinite Variational Autoencoder for Semi-Supervised Learning.” In *Advances in Neural Information Processing Systems 29*. http://arxiv.org/abs/1611.07800.

Archer, Evan, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. 2015. “Black Box Variational Inference for State Space Models,” November. http://arxiv.org/abs/1511.07367.

Baydin, Atılım Güneş, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, et al. 2019. “Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale,” August. http://arxiv.org/abs/1907.03382.

Bishop, Christopher. 1994. “Mixture Density Networks.” *Microsoft Research*, January. https://www.microsoft.com/en-us/research/publication/mixture-density-networks/.

Bora, Ashish, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. 2017. “Compressed Sensing Using Generative Models.” In *International Conference on Machine Learning*, 537–46. http://arxiv.org/abs/1703.03208.

Bui, Thang D., Sujith Ravi, and Vivek Ramavajjala. 2017. “Neural Graph Machines: Learning Neural Networks Using Graphs,” March. http://arxiv.org/abs/1703.04818.

Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. “Neural Ordinary Differential Equations.” In *Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc. http://papers.nips.cc/paper/7892-neural-ordinary-differential-equations.pdf.

Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. “Random Feature Expansions for Deep Gaussian Processes.” In *PMLR*. http://proceedings.mlr.press/v70/cutajar17a.html.

Damianou, Andreas, and Neil Lawrence. 2013. “Deep Gaussian Processes.” In *Artificial Intelligence and Statistics*, 207–15. http://proceedings.mlr.press/v31/damianou13a.html.

Doerr, Andreas, Christian Daniel, Martin Schiegg, Duy Nguyen-Tuong, Stefan Schaal, Marc Toussaint, and Sebastian Trimpe. 2018. “Probabilistic Recurrent State-Space Models,” January. http://arxiv.org/abs/1801.10395.

Dunlop, Matthew M., Mark A. Girolami, Andrew M. Stuart, and Aretha L. Teckentrup. 2018. “How Deep Are Deep Gaussian Processes?” *Journal of Machine Learning Research* 19 (1): 2100–2145. http://jmlr.org/papers/v19/18-015.html.

Dupont, Emilien, Arnaud Doucet, and Yee Whye Teh. 2019. “Augmented Neural ODEs,” April. http://arxiv.org/abs/1904.01681.

Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017. “Identification of Gaussian Process State Space Models.” In *Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc. http://papers.nips.cc/paper/7115-identification-of-gaussian-process-state-space-models.pdf.

Fabius, Otto, and Joost R. van Amersfoort. 2014. “Variational Recurrent Auto-Encoders.” In *Proceedings of ICLR*. http://arxiv.org/abs/1412.6581.

Flunkert, Valentin, David Salinas, and Jan Gasthaus. 2017. “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks,” April. http://arxiv.org/abs/1704.04110.

Gal, Yarin. 2015. “Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference.” In *Advances in Approximate Bayesian Inference Workshop, NIPS*.

———. 2016. “Uncertainty in Deep Learning.” University of Cambridge.

Gal, Yarin, and Zoubin Ghahramani. 2015a. “On Modern Deep Learning and Variational Inference.” In *Advances in Approximate Bayesian Inference Workshop, NIPS*.

———. 2015b. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In *Proceedings of the 33rd International Conference on Machine Learning (ICML-16)*. http://arxiv.org/abs/1506.02142.

———. 2016a. “A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In. http://arxiv.org/abs/1512.05287.

———. 2016b. “Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference.” In *4th International Conference on Learning Representations (ICLR) Workshop Track*. http://arxiv.org/abs/1506.02158.

Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018. “Conditional Neural Processes,” July, 10. https://arxiv.org/abs/1807.01613v1.

Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018. “Neural Processes,” July. https://arxiv.org/abs/1807.01622v1.

Gholami, Amir, Kurt Keutzer, and George Biros. 2019. “ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs,” February. http://arxiv.org/abs/1902.10298.

Graves, Alex. 2011. “Practical Variational Inference for Neural Networks.” In *Proceedings of the 24th International Conference on Neural Information Processing Systems*, 2348–56. NIPS’11. USA: Curran Associates Inc. https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks.pdf.

Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. “Speech Recognition with Deep Recurrent Neural Networks.” In *2013 IEEE International Conference on Acoustics, Speech and Signal Processing*. https://doi.org/10.1109/ICASSP.2013.6638947.

Gregor, Karol, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. “DRAW: A Recurrent Neural Network for Image Generation,” February. http://arxiv.org/abs/1502.04623.

Gu, Shixiang, Zoubin Ghahramani, and Richard E Turner. 2015. “Neural Adaptive Sequential Monte Carlo.” In *Advances in Neural Information Processing Systems 28*, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2629–37. Curran Associates, Inc. http://papers.nips.cc/paper/5961-neural-adaptive-sequential-monte-carlo.pdf.

Gu, Shixiang, Sergey Levine, Ilya Sutskever, and Andriy Mnih. 2016. “MuProp: Unbiased Backpropagation for Stochastic Neural Networks.” In *Proceedings of ICLR*. https://arxiv.org/abs/1511.05176v3.

Hoffman, Matthew, and David Blei. 2015. “Stochastic Structured Variational Inference.” In *PMLR*, 361–69. http://proceedings.mlr.press/v38/hoffman15.html.

Johnson, Matthew J., David Duvenaud, Alexander B. Wiltschko, Sandeep R. Datta, and Ryan P. Adams. 2016. “Composing Graphical Models with Neural Networks for Structured Representations and Fast Inference,” March. http://arxiv.org/abs/1603.06277.

Karl, Maximilian, Maximilian Soelch, Justin Bayer, and Patrick van der Smagt. 2016. “Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data.” In *Proceedings of ICLR*. http://arxiv.org/abs/1605.06432.

Kingma, Diederik P. 2017. “Variational Inference & Deep Learning: A New Synthesis.” https://www.dropbox.com/s/v6ua3d9yt44vgb3/cover_and_thesis.pdf?dl=0.

Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. “Improving Variational Inference with Inverse Autoregressive Flow.” In *Advances in Neural Information Processing Systems 29*. Curran Associates, Inc. http://arxiv.org/abs/1606.04934.

Kingma, Diederik P., and Max Welling. 2014. “Auto-Encoding Variational Bayes.” In *ICLR 2014 Conference*. http://arxiv.org/abs/1312.6114.

Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016. “AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In *UAI17*. http://arxiv.org/abs/1610.05392.

Krishnan, Rahul G., Uri Shalit, and David Sontag. 2015. “Deep Kalman Filters.” *arXiv Preprint arXiv:1511.05121*. https://arxiv.org/abs/1511.05121.

Larsen, Anders Boesen Lindbo, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2015. “Autoencoding Beyond Pixels Using a Learned Similarity Metric,” December. http://arxiv.org/abs/1512.09300.

Le, Tuan Anh, Atılım Güneş Baydin, and Frank Wood. 2017. “Inference Compilation and Universal Probabilistic Programming.” In *Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS)*, 54:1338–48. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR. http://arxiv.org/abs/1610.09900.

Le, Tuan Anh, Maximilian Igl, Tom Jin, Tom Rainforth, and Frank Wood. 2017. “Auto-Encoding Sequential Monte Carlo.” *arXiv Preprint arXiv:1705.10306*. https://arxiv.org/abs/1705.10306.

Lobacheva, Ekaterina, Nadezhda Chirkova, and Dmitry Vetrov. 2017. “Bayesian Sparsification of Recurrent Neural Networks.” In *Workshop on Learning to Generate Natural Language*. http://arxiv.org/abs/1708.00077.

Louizos, Christos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. “Causal Effect Inference with Deep Latent-Variable Models.” In *Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 6446–56. Curran Associates, Inc. http://papers.nips.cc/paper/7223-causal-effect-inference-with-deep-latent-variable-models.pdf.

Louizos, Christos, and Max Welling. 2016. “Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors.” In *arXiv Preprint arXiv:1603.04733*, 1708–16. http://arxiv.org/abs/1603.04733.

———. 2017. “Multiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In *PMLR*, 2218–27. http://proceedings.mlr.press/v70/louizos17a.html.

MacKay, David J C. 2002. *Information Theory, Inference & Learning Algorithms*. Cambridge University Press.

Maddison, Chris J., Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, and Yee Whye Teh. 2017. “Filtering Variational Objectives.” *arXiv Preprint arXiv:1705.09279*. https://arxiv.org/abs/1705.09279.

Matthews, Alexander G. de G., Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2016. “GPflow: A Gaussian Process Library Using TensorFlow,” October. http://arxiv.org/abs/1610.08733.

Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. 2017. “Variational Dropout Sparsifies Deep Neural Networks.” In *Proceedings of ICML*. http://arxiv.org/abs/1701.05369.

Neal, Radford M. 1996. *Bayesian Learning for Neural Networks*. Vol. 118. Secaucus, NJ, USA: Springer-Verlag New York, Inc. http://www.csri.utoronto.ca/~radford/ftp/thesis.pdf.

Ngiam, Jiquan, Zhenghao Chen, Pang W. Koh, and Andrew Y. Ng. 2011. “Learning Deep Energy Models.” In *Proceedings of the 28th International Conference on Machine Learning (ICML-11)*, 1105–12. http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Ngiam_557.pdf.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. *Gaussian Processes for Machine Learning*. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press. http://www.gaussianprocess.org/gpml/.

Ryder, Thomas, Andrew Golightly, A. Stephen McGough, and Dennis Prangle. 2018. “Black-Box Variational Inference for Stochastic Differential Equations,” February. http://arxiv.org/abs/1802.03335.

Tran, Dustin, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. “Deep Probabilistic Programming.” In *ICLR*. http://arxiv.org/abs/1701.03757.

Tran, Dustin, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. “Edward: A Library for Probabilistic Modeling, Inference, and Criticism,” October. http://arxiv.org/abs/1610.09787.

Wainwright, M., and M. Jordan. 2005. “A Variational Principle for Graphical Models.” In *New Directions in Statistical Signal Processing*. Vol. 155. MIT Press. http://metro-natshar-31-71.brain.net.pk/articles/new-directions-in-statistical-signal-processing-from-systems-to-brains-neural-information-processing.9780262083485.28286.pdf#page=166.