Probabilistic neural nets

Machinelearnese for “Bayesian”


NB: this is not current; I am doing too much research in the area to summarise it well, and it is large area.

Probably approximately a horse

Probabilistic neural networks are loosely, speaking, recipes for creating neural networks which plausibly solve some approximation to inference of a whole probability density of prediction, rather than a mere best point prediction. Sometimes this term seems to be used to mean the more related problem of finding some manner of Bayesian justification for a neural network. The mathematics comes out similar either way.

An alternative emphasis from where I am sitting is this:

Learning problems involve composition of differentiating and integrating various terms that measure various properties of how well you have approximated the state of the world. Probabilistic neural networks find combinations of integrals that we can solve by Monte carlo, and derivatives that we can solve via automatic differentiation, and which are both fast on modern hardware, and then use those cunning combination to find approximate solutions that we would have probably phrased in terms of specific integrals that are in practice completely intractable. The result is machine learning in strange and wonderful places where we could not have solved those integrals and derivatives in the traditional manner. Although… There is something odd about that setup. From this perspective the generative models (such as GANs and autoencoders) are solving an intractable integral by simulating samples probabilistically from them, in lieu of processing the continuous, unknowable, intractable integral that we actually wish to solve. But that continuous intractable integral was in any case a contrivance, a thought experiment imagining a world populated with such weird Platonic objects as integrals-over-possible-states-of-the-world which only mathematicians would consider reasonable. The world we live in has, as far as I know, no such thing. We do not have a world where the things we observe are stochastic samples from an ineffable probability density, but rather the observations themselves are the phenomena, and the probability density over them is an improbable abstraction. It must look deeply weird from the outside when we to talk about how we are solving integrals by looking at data, instead of solving data by looking at integrals.

To learn:

  • marginal likelihood in model selection: how does it work with many optima?

Backgrounders

Radford Neal’s thesis (Neal 1996) is a foundational asymptotically-Bayesian use of neural networks. Yarin Gal’s PhD Thesis (Gal 2016) summarizes some implicit approximate approaches (e.g. the Bayesian interpretation of dropout). Diederik P. Kingma’s thesis is the latest blockbuster in this tradition.

Alex Graves did a poster of his paper (Graves 2011) of a simplest prior uncertainty thing for recurrent nets - (diagonal Gaussian weight uncertainty) There is a 3rd party quick and dirty implementation.

One could refer to the 2019 NeurIPS Bayes deep learning workshop site which will have some more modern positioning.

One of the popular methods here is the variational autoencoder and affiliated reparameterization trick. Lielihood free methods seem to be in the air too.

Reparameterisation

See reparametrization.

Autoencoders

See autoencoders.

Practicalities

The computational toolsets for “neural” probabilistic programming and vanilla probabilistic programming are converging. See the tool listing under probabilistic programming.

Abbasnejad, Ehsan, Anthony Dick, and Anton van den Hengel. 2016. “Infinite Variational Autoencoder for Semi-Supervised Learning.” In Advances in Neural Information Processing Systems 29. http://arxiv.org/abs/1611.07800.

Archer, Evan, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. 2015. “Black Box Variational Inference for State Space Models.” November 23, 2015. http://arxiv.org/abs/1511.07367.

Baydin, Atılım Güneş, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, et al. 2019. “Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale.” In. http://arxiv.org/abs/1907.03382.

Bishop, Christopher. 1994. “Mixture Density Networks.” Microsoft Research, January. https://www.microsoft.com/en-us/research/publication/mixture-density-networks/.

Bora, Ashish, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. 2017. “Compressed Sensing Using Generative Models.” In International Conference on Machine Learning, 537–46. http://arxiv.org/abs/1703.03208.

Bui, Thang D., Sujith Ravi, and Vivek Ramavajjala. 2017. “Neural Graph Machines: Learning Neural Networks Using Graphs.” March 14, 2017. http://arxiv.org/abs/1703.04818.

Castro, Pablo de, and Tommaso Dorigo. 2019. “INFERNO: Inference-Aware Neural Optimisation.” Computer Physics Communications 244 (November): 170–79. https://doi.org/10.1016/j.cpc.2019.06.007.

Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. “Neural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc. http://papers.nips.cc/paper/7892-neural-ordinary-differential-equations.pdf.

Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. “Random Feature Expansions for Deep Gaussian Processes.” In PMLR. http://proceedings.mlr.press/v70/cutajar17a.html.

Damianou, Andreas, and Neil Lawrence. 2013. “Deep Gaussian Processes.” In Artificial Intelligence and Statistics, 207–15. http://proceedings.mlr.press/v31/damianou13a.html.

Doerr, Andreas, Christian Daniel, Martin Schiegg, Duy Nguyen-Tuong, Stefan Schaal, Marc Toussaint, and Sebastian Trimpe. 2018. “Probabilistic Recurrent State-Space Models.” January 31, 2018. http://arxiv.org/abs/1801.10395.

Dunlop, Matthew M., Mark A. Girolami, Andrew M. Stuart, and Aretha L. Teckentrup. 2018. “How Deep Are Deep Gaussian Processes?” Journal of Machine Learning Research 19 (1): 2100–2145. http://jmlr.org/papers/v19/18-015.html.

Dupont, Emilien, Arnaud Doucet, and Yee Whye Teh. 2019. “Augmented Neural ODEs.” April 2, 2019. http://arxiv.org/abs/1904.01681.

Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017. “Identification of Gaussian Process State Space Models.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc. http://papers.nips.cc/paper/7115-identification-of-gaussian-process-state-space-models.pdf.

Fabius, Otto, and Joost R. van Amersfoort. 2014. “Variational Recurrent Auto-Encoders.” In Proceedings of ICLR. http://arxiv.org/abs/1412.6581.

Flunkert, Valentin, David Salinas, and Jan Gasthaus. 2017. “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” April 13, 2017. http://arxiv.org/abs/1704.04110.

Gal, Yarin. 2015. “Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference.” In Advances in Approximate Bayesian Inference Workshop, NIPS.

———. 2016. “Uncertainty in Deep Learning.” University of Cambridge.

Gal, Yarin, and Zoubin Ghahramani. 2015a. “On Modern Deep Learning and Variational Inference.” In Advances in Approximate Bayesian Inference Workshop, NIPS.

———. 2016a. “A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In. http://arxiv.org/abs/1512.05287.

———. 2016b. “Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference.” In 4th International Conference on Learning Representations (ICLR) Workshop Track. http://arxiv.org/abs/1506.02158.

———. 2015b. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Machine Learning (ICML-16). http://arxiv.org/abs/1506.02142.

Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018. “Conditional Neural Processes.” July 4, 2018. https://arxiv.org/abs/1807.01613v1.

Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018. “Neural Processes,” July. https://arxiv.org/abs/1807.01622v1.

Gholami, Amir, Kurt Keutzer, and George Biros. 2019. “ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs.” February 26, 2019. http://arxiv.org/abs/1902.10298.

Graves, Alex. 2011. “Practical Variational Inference for Neural Networks.” In Proceedings of the 24th International Conference on Neural Information Processing Systems, 2348–56. NIPS’11. USA: Curran Associates Inc. https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks.pdf.

Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. “Speech Recognition with Deep Recurrent Neural Networks.” In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. https://doi.org/10.1109/ICASSP.2013.6638947.

Gregor, Karol, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. “DRAW: A Recurrent Neural Network for Image Generation.” February 16, 2015. http://arxiv.org/abs/1502.04623.

Gu, Shixiang, Zoubin Ghahramani, and Richard E Turner. 2015. “Neural Adaptive Sequential Monte Carlo.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2629–37. Curran Associates, Inc. http://papers.nips.cc/paper/5961-neural-adaptive-sequential-monte-carlo.pdf.

Gu, Shixiang, Sergey Levine, Ilya Sutskever, and Andriy Mnih. 2016. “MuProp: Unbiased Backpropagation for Stochastic Neural Networks.” In Proceedings of ICLR. https://arxiv.org/abs/1511.05176v3.

Hoffman, Matthew, and David Blei. 2015. “Stochastic Structured Variational Inference.” In PMLR, 361–69. http://proceedings.mlr.press/v38/hoffman15.html.

Johnson, Matthew J., David Duvenaud, Alexander B. Wiltschko, Sandeep R. Datta, and Ryan P. Adams. 2016. “Composing Graphical Models with Neural Networks for Structured Representations and Fast Inference.” March 20, 2016. http://arxiv.org/abs/1603.06277.

Karl, Maximilian, Maximilian Soelch, Justin Bayer, and Patrick van der Smagt. 2016. “Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data.” In Proceedings of ICLR. http://arxiv.org/abs/1605.06432.

Kingma, Diederik P. 2017. “Variational Inference & Deep Learning: A New Synthesis.” https://www.dropbox.com/s/v6ua3d9yt44vgb3/cover_and_thesis.pdf?dl=0.

Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. “Improving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29. Curran Associates, Inc. http://arxiv.org/abs/1606.04934.

Kingma, Diederik P., and Max Welling. 2014. “Auto-Encoding Variational Bayes.” In ICLR 2014 Conference. http://arxiv.org/abs/1312.6114.

Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016. “AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In UAI17. http://arxiv.org/abs/1610.05392.

Krishnan, Rahul G., Uri Shalit, and David Sontag. 2015. “Deep Kalman Filters.” 2015. https://arxiv.org/abs/1511.05121.

Larsen, Anders Boesen Lindbo, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2015. “Autoencoding Beyond Pixels Using a Learned Similarity Metric.” December 31, 2015. http://arxiv.org/abs/1512.09300.

Le, Tuan Anh, Atılım Güneş Baydin, and Frank Wood. 2017. “Inference Compilation and Universal Probabilistic Programming.” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 54:1338–48. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR. http://arxiv.org/abs/1610.09900.

Le, Tuan Anh, Maximilian Igl, Tom Jin, Tom Rainforth, and Frank Wood. 2017. “Auto-Encoding Sequential Monte Carlo.” 2017. https://arxiv.org/abs/1705.10306.

Lobacheva, Ekaterina, Nadezhda Chirkova, and Dmitry Vetrov. 2017. “Bayesian Sparsification of Recurrent Neural Networks.” In Workshop on Learning to Generate Natural Language. http://arxiv.org/abs/1708.00077.

Louizos, Christos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. “Causal Effect Inference with Deep Latent-Variable Models.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 6446–56. Curran Associates, Inc. http://papers.nips.cc/paper/7223-causal-effect-inference-with-deep-latent-variable-models.pdf.

Louizos, Christos, and Max Welling. 2016. “Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors.” In, 1708–16. http://arxiv.org/abs/1603.04733.

———. 2017. “Multiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In PMLR, 2218–27. http://proceedings.mlr.press/v70/louizos17a.html.

MacKay, David J C. 2002. Information Theory, Inference & Learning Algorithms. Cambridge University Press.

Maddison, Chris J., Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, and Yee Whye Teh. 2017. “Filtering Variational Objectives.” 2017. https://arxiv.org/abs/1705.09279.

Matthews, Alexander G. de G., Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2016. “GPflow: A Gaussian Process Library Using TensorFlow.” October 27, 2016. http://arxiv.org/abs/1610.08733.

Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. 2017. “Variational Dropout Sparsifies Deep Neural Networks.” In Proceedings of ICML. http://arxiv.org/abs/1701.05369.

Neal, Radford M. 1996. Bayesian Learning for Neural Networks. Vol. 118. Secaucus, NJ, USA: Springer-Verlag New York, Inc. http://www.csri.utoronto.ca/~radford/ftp/thesis.pdf.

Ngiam, Jiquan, Zhenghao Chen, Pang W. Koh, and Andrew Y. Ng. 2011. “Learning Deep Energy Models.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 1105–12. http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Ngiam_557.pdf.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press. http://www.gaussianprocess.org/gpml/.

Ryder, Thomas, Andrew Golightly, A. Stephen McGough, and Dennis Prangle. 2018. “Black-Box Variational Inference for Stochastic Differential Equations.” February 9, 2018. http://arxiv.org/abs/1802.03335.

Tran, Dustin, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. “Deep Probabilistic Programming.” In ICLR. http://arxiv.org/abs/1701.03757.

Tran, Dustin, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. “Edward: A Library for Probabilistic Modeling, Inference, and Criticism.” October 31, 2016. http://arxiv.org/abs/1610.09787.

Wainwright, Martin, and Michael I Jordan. 2005. “A Variational Principle for Graphical Models.” In New Directions in Statistical Signal Processing. Vol. 155. MIT Press.