Probabilistic neural nets

Bayesian and other probabilistic inference in overparameterized ML



Inferring densities and distributions in a massively parameterised deep learning settingin a Bayesian manner. Probabvilistic networks are more general than Bayes.

Jospin et al. (2022) is a modern high-speed intro and summary of many approaches.

Radford Neal’s thesis (Neal 1996) is a foundational Bayesian use of neural networks in the wide NN and MCMC sampling settings. Diederik P. Kingma’s thesis is a blockbuster in the more recent variational tradition.

Alex Graves’ poster of his paper (Graves 2011) of a simplest prior uncertainty thing for recurrent nets - (diagonal Gaussian weight uncertainty) I found elucidating. (There is a 3rd party quick and dirty implementation.)

One could refer to the 2019 NeurIPS Bayes deep learning workshop site which introduced some more modern positioning.

Generative methods are useful, e.g. the variational autoencoder and affiliated reparameterization trick. Likelihood free methods seems to be in the air too.

We are free to consider classic neural network inference as sort-of a special case of Bayes inference. Specifically, we interpret the loss function \(\mathcal{L}\) of a net \(f:\mathbb{R}^n\times\mathbb{R}^d\to\mathbb{R}^k\) in the likelihood setting \[ \begin{aligned} \mathcal{L}(\theta) &:=-\sum_{i=1}^{m} \log p\left(y_{i} \mid f\left(x_{i} ; \theta\right)\right)-\log p(\theta) \\ &=-\log p(\theta \mid \mathcal{D}). \end{aligned} \]

Obviously a few things are different from the point-estimate case; the parameter vector \(\theta\) is not interpretable, so what do posterior distributions over it even mean? What are sensible priors? Choosing priors over by-design-uninterpretable parameters such as NN weights is a whole fraught thing in ways we will mostly ignore for now. Usually a prior is by default something like \[ p(\theta)=\mathcal{N}\left(0, \lambda^{-1} I\right) \] for want of a better idea. This ends up being equivalent to the β€œweight decay” regularisation in the sense that Bayesian priors and regularisations often are.

With that basis e could do the usual stuff for Bayes inference, like considering the predictive posterior \[ p(y \mid x, \mathcal{D})=\int p(y \mid f(x ; \theta)) p(\theta \mid \mathcal{D}) d \theta \] Usually this posterior turns out to be intractable to calculate in the very-high-dimensional parameter spaces of NNs, so we choose something simpler. We could summarise our posterior update by simple maximum a posteriori estimate \[ \theta_{\mathrm{MAP}}:=\operatorname{arg min}_{\theta} \mathcal{L}(\theta). \] In this case we have recovered the classic training of non-Bayes nets with some ad hoc regularisation which we claim was secretly a prior. But we have no notion of predictive uncertainty if we stop there.

Usually the model will possess many optima, and this will lead suspicion that we have not found a good global one. How do we maximise model evidence here in any case?

Somewhere between the full belt-and-braces Bayes approach and the MAP point estimate there are various approximations to Bayes inference we might try. What follows is a non-exhaustive smΓΆrgΓ₯sbord of options to do probabilistic inference in neural nets with different trade-offs.

πŸ— To discuss: so many options for predictive uncertainty, but fewer for inverse uncertainty.

Natural Posterior Network

borchero/natural-posterior-network (Charpentier et al. 2022): some kind of reparameterization uncertainty?

MC sampling of weights by low-rank Matheron updates

This uses GP Matheron updates. Needs a shorter names but looks cool (Ritter et al. 2021). The idea is that we keep weights random, but then create a sparse representation of the weights.

microsoft/bayesianize: Bayesianize: A Bayesian neural network wrapper in pytorch.

  • Mean-field variational inference (MFVI): variational inference with fully factorised Gaussian (FFG) approximation.
  • Variational inference with full-covariance Gaussian approximation (for each layer).
  • Variational inference with inducing weights: each of the layer is augmented with a small matrix of inducing weights, then MFVI is performed in the inducing weight space.
  • Ensemble in inducing weight space: same augmentation as above, but with ensembles in the inducing weight space.

Bayes by backprop

See Bayes by backprop.

Variational autoencoders

See variational autoencoders.

Sampling via Monte Carlo

TBD. For now, if the number of parameters is smallish see Hamiltonian Monte Carlo.

Via random projections

I do not have a single paper about this, but I have seen random projection pop up as a piece of the puzzle in other methods. TBC.

In Gaussian process regression

See kernel learning.

Via measure transport

See reparameterization.

Via infinite-width random nets

See wide NN.

Via NTK

How does this work? He, Lakshminarayanan, and Teh (2020).

Ensemble methods

Deep learning has its own variants model averaging and bagging: Neural ensembles. Yarin Gal’s PhD Thesis (Gal 2016) summarizes some implicit approximate approaches (e.g. the Bayesian interpretation of dropout) although dropout as he frames it has been contested these days as a means of inference.

Neural GLM

I think this has sparse bayes flavour. D. Tran et al. (2019); seems to randomise over input params?

Practicalities

The computational toolsets for β€œneural” probabilistic programming and vanilla probabilistic programming are converging. See the tool listing under probabilistic programming.

Stochastic Gradient Descent as MC inference

See MCMC by SGD.

Khan and Rue’s Bayes Learning Rule

Bayes via natural gradient (Khan and Rue 2022; Zellner 1988).

We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton’s method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.

Incoming

Dustin Tran’s uncertainty layers [1812.03973] Bayesian Layers: A Module for Neural Network Uncertainty:

In our work, we extend layers to capture β€œdistributions over functions”, which we describe as a layer with uncertainty about some state in its computation β€” be it uncertainty in the weights, pre-activation units, activations, or the entire function. Each sample from the distribution instantiates a different function, e.g., a layer with a different weight configuration.…

While the framework we laid out so far tightly integrates deep Bayesian modelling into existing ecosystems, we have deliberately limited our scope. In particular, our layers tie the model specification to the inference algorithm (typically, variational inference). Bayesian Layers’ core assumption is the modularization of inference per layer. This makes inference procedures which depend on the full parameter space, such as Markov chain Monte Carlo, difficult to fit within the framework.

References

Abbasnejad, Ehsan, Anthony Dick, and Anton van den Hengel. 2016. β€œInfinite Variational Autoencoder for Semi-Supervised Learning.” In Advances in Neural Information Processing Systems 29.
Alexanderian, Alen. 2021. β€œOptimal Experimental Design for Infinite-Dimensional Bayesian Inverse Problems Governed by PDEs: A Review.” arXiv:2005.12998 [Math], January.
Alexanderian, Alen, Noemi Petra, Georg Stadler, and Omar Ghattas. 2016. β€œA Fast and Scalable Method for A-Optimal Design of Experiments for Infinite-Dimensional Bayesian Nonlinear Inverse Problems.” SIAM Journal on Scientific Computing 38 (1): A243–72.
Alexos, Antonios, Alex J. Boyd, and Stephan Mandt. 2022. β€œStructured Stochastic Gradient MCMC.” In Proceedings of the 39th International Conference on Machine Learning, 414–34. PMLR.
Alquier, Pierre. 2021. β€œUser-Friendly Introduction to PAC-Bayes Bounds.” arXiv:2110.11216 [Cs, Math, Stat], October.
β€”β€”β€”. 2023. β€œUser-Friendly Introduction to PAC-Bayes Bounds.” arXiv.
Archer, Evan, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. 2015. β€œBlack Box Variational Inference for State Space Models.” arXiv:1511.07367 [Stat], November.
Bao, Gang, Xiaojing Ye, Yaohua Zang, and Haomin Zhou. 2020. β€œNumerical Solution of Inverse Problems by Weak Adversarial Networks.” Inverse Problems 36 (11): 115003.
Baydin, AtΔ±lΔ±m GΓΌneş, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, et al. 2019. β€œEtalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale.” In arXiv:1907.03382 [Cs, Stat].
Bazzani, Loris, Lorenzo Torresani, and Hugo Larochelle. 2017. β€œRecurrent Mixture Density Network for Spatiotemporal Visual Attention,” 15.
Berg, Rianne van den, Leonard Hasenclever, Jakub M. Tomczak, and Max Welling. 2018. β€œSylvester Normalizing Flows for Variational Inference.” In UAI18.
Bishop, Christopher. 1994. β€œMixture Density Networks.” Microsoft Research, January.
Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. Information Science and Statistics. New York: Springer.
Blundell, Charles, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. β€œWeight Uncertainty in Neural Networks.” In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, 1613–22. ICML’15. Lille, France: JMLR.org.
Bora, Ashish, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. 2017. β€œCompressed Sensing Using Generative Models.” In International Conference on Machine Learning, 537–46.
Breslow, N. E., and D. G. Clayton. 1993. β€œApproximate Inference in Generalized Linear Mixed Models.” Journal of the American Statistical Association 88 (421): 9–25.
Bui, Thang D., Sujith Ravi, and Vivek Ramavajjala. 2017. β€œNeural Graph Machines: Learning Neural Networks Using Graphs.” arXiv:1703.04818 [Cs], March.
Castro, Pablo de, and Tommaso Dorigo. 2019. β€œINFERNO: Inference-Aware Neural Optimisation.” Computer Physics Communications 244 (November): 170–79.
Chada, Neil, and Xin Tong. 2022. β€œConvergence Acceleration of Ensemble Kalman Inversion in Nonlinear Settings.” Mathematics of Computation 91 (335): 1247–80.
Charpentier, Bertrand, Oliver Borchert, Daniel ZΓΌgner, Simon Geisler, and Stephan GΓΌnnemann. 2022. β€œNatural Posterior Network: Deep Bayesian Uncertainty for Exponential Family Distributions.” arXiv:2105.04471 [Cs, Stat], March.
Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018. β€œNeural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc.
Chen, Wilson Ye, Lester Mackey, Jackson Gorham, Francois-Xavier Briol, and Chris Oates. 2018. β€œStein Points.” In Proceedings of the 35th International Conference on Machine Learning, 844–53. PMLR.
Chu, Xu, Yujie Jin, Wenwu Zhu, Yasha Wang, Xin Wang, Shanghang Zhang, and Hong Mei. 2022. β€œDNA: Domain Generalization with Diversified Neural Averaging.” In Proceedings of the 39th International Conference on Machine Learning, 4010–34. PMLR.
Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. β€œRandom Feature Expansions for Deep Gaussian Processes.” In PMLR.
Damianou, Andreas, and Neil Lawrence. 2013. β€œDeep Gaussian Processes.” In Artificial Intelligence and Statistics, 207–15.
Dandekar, Raj, Karen Chung, Vaibhav Dixit, Mohamed Tarek, Aslan Garcia-Valadez, Krishna Vishal Vemula, and Chris Rackauckas. 2021. β€œBayesian Neural Ordinary Differential Equations.” arXiv:2012.07244 [Cs], March.
Daxberger, Erik, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. 2021. β€œLaplace Redux β€” Effortless Bayesian Deep Learning.” In arXiv:2106.14806 [Cs, Stat].
Dezfouli, Amir, and Edwin V. Bonilla. 2015. β€œScalable Inference for Gaussian Process Models with Black-Box Likelihoods.” In Advances in Neural Information Processing Systems 28, 1414–22. NIPS’15. Cambridge, MA, USA: MIT Press.
Doerr, Andreas, Christian Daniel, Martin Schiegg, Duy Nguyen-Tuong, Stefan Schaal, Marc Toussaint, and Sebastian Trimpe. 2018. β€œProbabilistic Recurrent State-Space Models.” arXiv:1801.10395 [Stat], January.
Domingos, Pedro. 2020. β€œEvery Model Learned by Gradient Descent Is Approximately a Kernel Machine.” arXiv:2012.00152 [Cs, Stat], November.
Dunlop, Matthew M., Mark A. Girolami, Andrew M. Stuart, and Aretha L. Teckentrup. 2018. β€œHow Deep Are Deep Gaussian Processes?” Journal of Machine Learning Research 19 (1): 2100–2145.
Dupont, Emilien, Arnaud Doucet, and Yee Whye Teh. 2019. β€œAugmented Neural ODEs.” arXiv:1904.01681 [Cs, Stat], April.
Dusenberry, Michael, Ghassen Jerfel, Yeming Wen, Yian Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, and Dustin Tran. 2020. β€œEfficient and Scalable Bayesian Neural Nets with Rank-1 Factors.” In Proceedings of the 37th International Conference on Machine Learning, 2782–92. PMLR.
Dutordoir, Vincent, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, and Nicolas Durrande. 2021. β€œDeep Neural Networks as Point Estimates for Deep Gaussian Processes.” In arXiv:2105.04504 [Cs, Stat].
Dziugaite, Gintare Karolina, and Daniel M. Roy. 2017. β€œComputing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters Than Training Data.” arXiv:1703.11008 [Cs], October.
Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017. β€œIdentification of Gaussian Process State Space Models.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc.
Fabius, Otto, and Joost R. van Amersfoort. 2014. β€œVariational Recurrent Auto-Encoders.” In Proceedings of ICLR.
Figurnov, Mikhail, Shakir Mohamed, and Andriy Mnih. 2018. β€œImplicit Reparameterization Gradients.” In Advances in Neural Information Processing Systems 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 441–52. Curran Associates, Inc.
Flaxman, Seth, Andrew Gordon Wilson, Daniel B Neill, Hannes Nickisch, and Alexander J Smola. 2015. β€œFast Kronecker Inference in Gaussian Processes with Non-Gaussian Likelihoods.” In, 10.
Flunkert, Valentin, David Salinas, and Jan Gasthaus. 2017. β€œDeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” arXiv:1704.04110 [Cs, Stat], April.
Foong, Andrew Y. K., Yingzhen Li, JosΓ© Miguel HernΓ‘ndez-Lobato, and Richard E. Turner. 2019. β€œβ€˜In-Between’ Uncertainty in Bayesian Neural Networks.” arXiv:1906.11537 [Cs, Stat], June.
Fortuin, Vincent. 2022. β€œPriors in Bayesian Deep Learning: A Review.” International Statistical Review 90 (3): 563–91.
Gal, Yarin. 2015. β€œRapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference.” In Advances in Approximate Bayesian Inference Workshop, NIPS.
β€”β€”β€”. 2016. β€œUncertainty in Deep Learning.” University of Cambridge.
Gal, Yarin, and Zoubin Ghahramani. 2015a. β€œOn Modern Deep Learning and Variational Inference.” In Advances in Approximate Bayesian Inference Workshop, NIPS.
β€”β€”β€”. 2015b. β€œDropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
β€”β€”β€”. 2016a. β€œA Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In arXiv:1512.05287 [Stat].
β€”β€”β€”. 2016b. β€œBayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference.” In 4th International Conference on Learning Representations (ICLR) Workshop Track.
β€”β€”β€”. 2016c. β€œDropout as a Bayesian Approximation: Appendix.” arXiv:1506.02157 [Stat], May.
Gal, Yarin, Jiri Hron, and Alex Kendall. 2017. β€œConcrete Dropout.” arXiv:1705.07832 [Stat], May.
Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018. β€œConditional Neural Processes.” arXiv:1807.01613 [Cs, Stat], July, 10.
Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018. β€œNeural Processes,” July.
Gholami, Amir, Kurt Keutzer, and George Biros. 2019. β€œANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs.” arXiv:1902.10298 [Cs], February.
Giryes, R., G. Sapiro, and A. M. Bronstein. 2016. β€œDeep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?” IEEE Transactions on Signal Processing 64 (13): 3444–57.
Gorad, Ajinkya, Zheng Zhao, and Simo SΓ€rkkΓ€. 2020. β€œParameter Estimation in Non-Linear State-Space Models by Automatic Differentiation of Non-Linear Kalman Filters.” In, 6.
Gourieroux, C., A. Monfort, and E. Renault. 1993. β€œIndirect Inference.” Journal of Applied Econometrics 8 (December): S85–118.
Graves, Alex. 2011. β€œPractical Variational Inference for Neural Networks.” In Proceedings of the 24th International Conference on Neural Information Processing Systems, 2348–56. NIPS’11. USA: Curran Associates Inc.
β€”β€”β€”. 2013. β€œGenerating Sequences With Recurrent Neural Networks.” arXiv:1308.0850 [Cs], August.
Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. β€œSpeech Recognition with Deep Recurrent Neural Networks.” In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
Gregor, Karol, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. β€œDRAW: A Recurrent Neural Network For Image Generation.” arXiv:1502.04623 [Cs], February.
Gu, Shixiang, Zoubin Ghahramani, and Richard E Turner. 2015. β€œNeural Adaptive Sequential Monte Carlo.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2629–37. Curran Associates, Inc.
Gu, Shixiang, Sergey Levine, Ilya Sutskever, and Andriy Mnih. 2016. β€œMuProp: Unbiased Backpropagation for Stochastic Neural Networks.” In Proceedings of ICLR.
Guo, Chuan, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. β€œOn Calibration of Modern Neural Networks.” arXiv.
Gurevich, Pavel, and Hannes Stuke. 2019. β€œGradient Conjugate Priors and Multi-Layer Neural Networks.” arXiv.
Guth, Stephen Carrol, Alireza Mojahed, and Themistoklis Sapsis. 2023. β€œEvaluation of Machine Learning Architectures on the Quantification Of Epistemic and Aleatoric Uncertainties In Complex Dynamical Systems.” SSRN Scholarly Paper. Rochester, NY.
Haber, Eldad, Felix Lucka, and Lars Ruthotto. 2018. β€œNever Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.” arXiv:1805.08034 [Cs, Math], May.
He, Bobby, Balaji Lakshminarayanan, and Yee Whye Teh. 2020. β€œBayesian Deep Ensembles via the Neural Tangent Kernel.” In Advances in Neural Information Processing Systems. Vol. 33.
Hoffman, Matthew, and David Blei. 2015. β€œStochastic Structured Variational Inference.” In PMLR, 361–69.
Hu, Zhiting, Zichao Yang, Ruslan Salakhutdinov, and Eric P. Xing. 2018. β€œOn Unifying Deep Generative Models.” In arXiv:1706.00550 [Cs, Stat].
Huggins, Jonathan H., Trevor Campbell, MikoΕ‚aj Kasprzak, and Tamara Broderick. 2018. β€œPractical Bounds on the Error of Bayesian Posterior Approximations: A Nonasymptotic Approach.” arXiv:1809.09505 [Cs, Math, Stat], September.
Immer, Alexander, Matthias Bauer, Vincent Fortuin, Gunnar RΓ€tsch, and Khan Mohammad Emtiyaz. 2021. β€œScalable Marginal Likelihood Estimation for Model Selection in Deep Learning.” In Proceedings of the 38th International Conference on Machine Learning, 4563–73. PMLR.
Immer, Alexander, Maciej Korzepa, and Matthias Bauer. 2021. β€œImproving Predictions of Bayesian Neural Nets via Local Linearization.” In International Conference on Artificial Intelligence and Statistics, 703–11. PMLR.
Ingebrigtsen, Rikke, Finn Lindgren, and Ingelin Steinsland. 2014. β€œSpatial Models with Explanatory Variables in the Dependence Structure.” Spatial Statistics, Spatial Statistics Miami, 8 (May): 20–38.
Izmailov, Pavel, Wesley J. Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2020. β€œSubspace Inference for Bayesian Deep Learning.” In Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, 1169–79. PMLR.
Izmailov, Pavel, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. β€œAveraging Weights Leads to Wider Optima and Better Generalization,” March.
Jospin, Laurent Valentin, Wray Buntine, Farid Boussaid, Hamid Laga, and Mohammed Bennamoun. 2022. β€œHands-on Bayesian Neural Networks β€” a Tutorial for Deep Learning Users.” arXiv:2007.06823 [Cs, Stat], January.
Karl, Maximilian, Maximilian Soelch, Justin Bayer, and Patrick van der Smagt. 2016. β€œDeep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data.” In Proceedings of ICLR.
Khan, Mohammad Emtiyaz, Alexander Immer, Ehsan Abedi, and Maciej Korzepa. 2020. β€œApproximate Inference Turns Deep Networks into Gaussian Processes.” arXiv:1906.01930 [Cs, Stat], July.
Khan, Mohammad Emtiyaz, and HΓ₯vard Rue. 2022. β€œThe Bayesian Learning Rule.” arXiv.
Kingma, Diederik P. 2017. β€œVariational Inference & Deep Learning: A New Synthesis.”
Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. β€œImproving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
Kingma, Diederik P., and Max Welling. 2014. β€œAuto-Encoding Variational Bayes.” In ICLR 2014 Conference.
Kovachki, Nikola B., and Andrew M. Stuart. 2019. β€œEnsemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.” Inverse Problems 35 (9): 095005.
Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016. β€œAutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In UAI17.
Krishnan, Rahul G., Uri Shalit, and David Sontag. 2015. β€œDeep Kalman Filters.” arXiv Preprint arXiv:1511.05121.
Kristiadi, Agustinus, Matthias Hein, and Philipp Hennig. 2020. β€œBeing Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks.” In ICML 2020.
β€”β€”β€”. 2021a. β€œAn Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence.” Advances in Neural Information Processing Systems 34: 18789–800.
β€”β€”β€”. 2021b. β€œLearnable Uncertainty Under Laplace Approximations.” In Uncertainty in Artificial Intelligence.
β€”β€”β€”. 2022. β€œBeing a Bit Frequentist Improves Bayesian Neural Networks.” In CoRR. arXiv.
Larsen, Anders Boesen Lindbo, SΓΈren Kaae SΓΈnderby, Hugo Larochelle, and Ole Winther. 2015. β€œAutoencoding Beyond Pixels Using a Learned Similarity Metric.” arXiv:1512.09300 [Cs, Stat], December.
Le, Tuan Anh, AtΔ±lΔ±m GΓΌneş Baydin, and Frank Wood. 2017. β€œInference Compilation and Universal Probabilistic Programming.” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 54:1338–48. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR.
Le, Tuan Anh, Maximilian Igl, Tom Jin, Tom Rainforth, and Frank Wood. 2017. β€œAuto-Encoding Sequential Monte Carlo.” arXiv Preprint arXiv:1705.10306.
Lee, Herbert K. H., Dave M. Higdon, Zhuoxin Bi, Marco A. R. Ferreira, and Mike West. 2002. β€œMarkov Random Field Models for High-Dimensional Parameters in Simulations of Fluid Flow in Porous Media.” Technometrics 44 (3): 230–41.
Lee, Jaehoon, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018. β€œDeep Neural Networks as Gaussian Processes.” In ICLR.
Lindgren, Finn, and HΓ₯vard Rue. 2015. β€œBayesian Spatial Modelling with R-INLA.” Journal of Statistical Software 63 (i19): 1–25.
Liu, Qiang, and Dilin Wang. 2019. β€œStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” In Advances In Neural Information Processing Systems.
Liu, Xiao, Kyongmin Yeo, and Siyuan Lu. 2020. β€œStatistical Modeling for Spatio-Temporal Data From Stochastic Convection-Diffusion Processes.” Journal of the American Statistical Association 0 (0): 1–18.
Lobacheva, Ekaterina, Nadezhda Chirkova, and Dmitry Vetrov. 2017. β€œBayesian Sparsification of Recurrent Neural Networks.” In Workshop on Learning to Generate Natural Language.
Long, Quan, Marco Scavino, RaΓΊl Tempone, and Suojin Wang. 2013. β€œFast Estimation of Expected Information Gains for Bayesian Experimental Designs Based on Laplace Approximations.” Computer Methods in Applied Mechanics and Engineering 259 (June): 24–39.
Lorsung, Cooper. 2021. β€œUnderstanding Uncertainty in Bayesian Deep Learning.” arXiv.
Louizos, Christos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. β€œCausal Effect Inference with Deep Latent-Variable Models.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 6446–56. Curran Associates, Inc.
Louizos, Christos, and Max Welling. 2016. β€œStructured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors.” In arXiv Preprint arXiv:1603.04733, 1708–16.
β€”β€”β€”. 2017. β€œMultiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In PMLR, 2218–27.
MacKay, David J C. 2002. Information Theory, Inference & Learning Algorithms. Cambridge University Press.
Mackay, David J. C. 1992. β€œA Practical Bayesian Framework for Backpropagation Networks.” Neural Computation 4 (3): 448–72.
Maddison, Chris J., Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, and Yee Whye Teh. 2017. β€œFiltering Variational Objectives.” arXiv Preprint arXiv:1705.09279.
Maddox, Wesley, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, and Andrew Gordon Wilson. 2019. β€œA Simple Baseline for Bayesian Uncertainty in Deep Learning,” February.
Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017. β€œStochastic Gradient Descent as Approximate Bayesian Inference.” JMLR, April.
Margossian, Charles C., Aki Vehtari, Daniel Simpson, and Raj Agrawal. 2020. β€œHamiltonian Monte Carlo Using an Adjoint-Differentiated Laplace Approximation: Bayesian Inference for Latent Gaussian Models and Beyond.” arXiv:2004.12550 [Stat], October.
Martens, James, and Roger Grosse. 2015. β€œOptimizing Neural Networks with Kronecker-Factored Approximate Curvature.” In Proceedings of the 32nd International Conference on Machine Learning, 2408–17. PMLR.
Matthews, Alexander Graeme de Garis, Mark Rowland, Jiri Hron, Richard E. Turner, and Zoubin Ghahramani. 2018. β€œGaussian Process Behaviour in Wide Deep Neural Networks.” In arXiv:1804.11271 [Cs, Stat].
Matthews, Alexander Graeme de Garis, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo LeΓ³n-VillagrΓ‘, Zoubin Ghahramani, and James Hensman. 2016. β€œGPflow: A Gaussian Process Library Using TensorFlow.” arXiv:1610.08733 [Stat], October.
Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. 2017. β€œVariational Dropout Sparsifies Deep Neural Networks.” In Proceedings of ICML.
Murphy, Kevin P. 2023. Probabilistic Machine Learning: Advanced Topics. MIT Press.
Neal, Radford M. 1996. β€œBayesian Learning for Neural Networks.” Secaucus, NJ, USA: Springer-Verlag New York, Inc.
Ngiam, Jiquan, Zhenghao Chen, Pang W. Koh, and Andrew Y. Ng. 2011. β€œLearning Deep Energy Models.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 1105–12.
Ngufor, Che, Holly van Houten, Brian S. Caffo, Nilay D. Shah, and Rozalina G. McCoy. 2019. β€œMixed Effect Machine Learning: A Framework for Predicting Longitudinal Change in Hemoglobin A1c.” Journal of Biomedical Informatics 89 (January): 56–67.
Oakley, Jeremy E., and Benjamin D. Youngman. 2017. β€œCalibration of Stochastic Computer Simulators Using Likelihood Emulation.” Technometrics 59 (1): 80–92.
Ober, Sebastian W., and Carl E. Rasmussen. 2019. β€œBenchmarking the Neural Linear Model for Regression.” In. arXiv.
Opitz, Thomas, RaphaΓ«l Huser, Haakon Bakka, and HΓ₯vard Rue. 2018. β€œINLA Goes Extreme: Bayesian Tail Regression for the Estimation of High Spatio-Temporal Quantiles.” Extremes 21 (3): 441–62.
Ovadia, Yaniv, Emily Fertig, Jie Ren, Zachary Nado, D. Sculley, Sebastian Nowozin, Joshua V. Dillon, Balaji Lakshminarayanan, and Jasper Snoek. 2019. β€œCan You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift.” In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 14003–14. Red Hook, NY, USA: Curran Associates Inc.
Pan, Yu, Kwo-Sen Kuo, Michael L. Rilee, and Hongfeng Yu. 2021. β€œAssessing Deep Neural Networks as Probability Estimators.” arXiv:2111.08239 [Cs, Stat], November.
Papadopoulos, G., P.J. Edwards, and A.F. Murray. 2001. β€œConfidence Estimation Methods for Neural Networks: A Practical Comparison.” IEEE Transactions on Neural Networks 12 (6): 1278–87.
Papamakarios, George, Iain Murray, and Theo Pavlakou. 2017. β€œMasked Autoregressive Flow for Density Estimation.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 2338–47. Curran Associates, Inc.
Partee, Sam, Michael Ringenburg, Benjamin Robbins, and Andrew Shao. 2019. β€œModel Parameter Optimization: ML-Guided Trans-Resolution Tuning of Physical Models.” In. Zenodo.
Peluchetti, Stefano, and Stefano Favaro. 2020. β€œInfinitely Deep Neural Networks as Diffusion Processes.” In International Conference on Artificial Intelligence and Statistics, 1126–36. PMLR.
Petersen, Kaare Brandt, and Michael Syskind Pedersen. 2012. β€œThe Matrix Cookbook.”
Piterbarg, V. I., and V. R. Fatalov. 1995. β€œThe Laplace Method for Probability Measures in Banach Spaces.” Russian Mathematical Surveys 50 (6): 1151.
Psaros, Apostolos F., Xuhui Meng, Zongren Zou, Ling Guo, and George Em Karniadakis. 2023. β€œUncertainty Quantification in Scientific Machine Learning: Methods, Metrics, and Comparisons.” Journal of Computational Physics 477 (March): 111902.
Raissi, Maziar, P. Perdikaris, and George Em Karniadakis. 2019. β€œPhysics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations.” Journal of Computational Physics 378 (February): 686–707.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press.
Rezende, Danilo Jimenez, and Shakir Mohamed. 2015. β€œVariational Inference with Normalizing Flows.” In International Conference on Machine Learning, 1530–38. ICML’15. Lille, France: JMLR.org.
Rezende, Danilo J, SΓ©bastien RacaniΓ¨re, Irina Higgins, and Peter Toth. 2019. β€œEquivariant Hamiltonian Flows.” In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS), 6.
Ritter, Hippolyt, Aleksandar Botev, and David Barber. 2018. β€œA Scalable Laplace Approximation for Neural Networks.” In.
Ritter, Hippolyt, and Theofanis Karaletsos. 2022. β€œTyXe: Pyro-Based Bayesian Neural Nets for Pytorch.” Proceedings of Machine Learning and Systems 4 (April): 398–413.
Ritter, Hippolyt, Martin Kukla, Cheng Zhang, and Yingzhen Li. 2021. β€œSparse Uncertainty Representation in Deep Learning with Inducing Weights.” arXiv:2105.14594 [Cs, Stat], May.
Rue, HΓ₯vard, Andrea Riebler, Sigrunn H. SΓΈrbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren. 2016. β€œBayesian Computing with INLA: A Review.” arXiv:1604.00860 [Stat], September.
Ruiz, Francisco J. R., Michalis K. Titsias, and David M. Blei. 2016. β€œThe Generalized Reparameterization Gradient.” In Advances In Neural Information Processing Systems.
Ryder, Thomas, Andrew Golightly, A. Stephen McGough, and Dennis Prangle. 2018. β€œBlack-Box Variational Inference for Stochastic Differential Equations.” arXiv:1802.03335 [Stat], February.
Sanchez-Gonzalez, Alvaro, Victor Bapst, Peter Battaglia, and Kyle Cranmer. 2019. β€œHamiltonian Graph Networks with ODE Integrators.” In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS), 11.
Saumard, Adrien, and Jon A. Wellner. 2014. β€œLog-Concavity and Strong Log-Concavity: A Review.” arXiv:1404.5886 [Math, Stat], April.
Shi, Jiaxin, Shengyang Sun, and Jun Zhu. 2018. β€œA Spectral Approach to Gradient Estimation for Implicit Distributions.” In. arXiv.
Sigrist, Fabio, Hans R. KΓΌnsch, and Werner A. Stahel. 2015. β€œStochastic Partial Differential Equation Based Modelling of Large Space-Time Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77 (1): 3–33.
Simchoni, Giora, and Saharon Rosset. 2023. β€œIntegrating Random Effects in Deep Neural Networks.” arXiv.
Snoek, Jasper, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md Mostofa Ali Patwary, Prabhat, and Ryan P. Adams. 2015. β€œScalable Bayesian Optimization Using Deep Neural Networks.” In Proceedings of the 32nd International Conference on Machine Learning.
Solin, Arno, and Simo SΓ€rkkΓ€. 2020. β€œHilbert Space Methods for Reduced-Rank Gaussian Process Regression.” Statistics and Computing 30 (2): 419–46.
Sun, Shengyang, Guodong Zhang, Jiaxin Shi, and Roger Grosse. 2019. β€œFunctional Variational Bayesian Neural Networks.” In.
Tang, Yanbo, and Nancy Reid. 2021. β€œLaplace and Saddlepoint Approximations in High Dimensions.” arXiv:2107.10885 [Math, Stat], July.
Thakur, Sujay, Cooper Lorsung, Yaniv Yacoby, Finale Doshi-Velez, and Weiwei Pan. 2021. β€œUncertainty-Aware (UNA) Bases for Deep Bayesian Regression Using Multi-Headed Auxiliary Networks.” arXiv:2006.11695 [Cs, Stat], December.
Tran, Ba-Hien, Simone Rossi, Dimitrios Milios, and Maurizio Filippone. 2022. β€œAll You Need Is a Good Functional Prior for Bayesian Deep Learning.” Journal of Machine Learning Research 23 (74): 1–56.
Tran, Ba-Hien, Simone Rossi, Dimitrios Milios, Pietro Michiardi, Edwin V Bonilla, and Maurizio Filippone. 2021. β€œModel Selection for Bayesian Autoencoders.” In Advances in Neural Information Processing Systems, 34:19730–42. Curran Associates, Inc.
Tran, Dustin, Mike Dusenberry, Mark van der Wilk, and Danijar Hafner. 2019. β€œBayesian Layers: A Module for Neural Network Uncertainty.” Advances in Neural Information Processing Systems 32.
Tran, Dustin, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. β€œDeep Probabilistic Programming.” In ICLR.
Tran, Dustin, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. β€œEdward: A Library for Probabilistic Modeling, Inference, and Criticism.” arXiv:1610.09787 [Cs, Stat], October.
Wacker, Philipp. 2017. β€œLaplace’s Method in Bayesian Inverse Problems.” arXiv:1701.07989 [Math], April.
Wainwright, Martin, and Michael I Jordan. 2005. β€œA Variational Principle for Graphical Models.” In New Directions in Statistical Signal Processing. Vol. 155. MIT Press.
Watson, Joe, Jihao Andreas Lin, Pascal Klink, and Jan Peters. 2020. β€œNeural Linear Models with Functional Gaussian Process Priors.” In, 10.
Weber, Noah, Janez Starc, Arpit Mittal, Roi Blanco, and LluΓ­s MΓ rquez. 2018. β€œOptimizing over a Bayesian Last Layer.” In NeurIPS Workshop on Bayesian Deep Learning.
Wen, Yeming, Dustin Tran, and Jimmy Ba. 2020. β€œBatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning.” In ICLR.
Wenzel, Florian, Kevin Roth, Bastiaan Veeling, Jakub Swiatkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, and Sebastian Nowozin. 2020. β€œHow Good Is the Bayes Posterior in Deep Neural Networks Really?” In Proceedings of the 37th International Conference on Machine Learning, 119:10248–59. PMLR.
Wilson, Andrew Gordon, and Pavel Izmailov. 2020. β€œBayesian Deep Learning and a Probabilistic Perspective of Generalization,” February.
Xu, Kailai, and Eric Darve. 2020. β€œADCME: Learning Spatially-Varying Physical Fields Using Deep Neural Networks.” In arXiv:2011.11955 [Cs, Math].
Yang, Yunfei, Zhen Li, and Yang Wang. 2021. β€œOn the Capacity of Deep Generative Networks for Approximating Distributions.” arXiv:2101.12353 [Cs, Math, Stat], January.
Zeevi, Assaf J., and Ronny Meir. 1997. β€œDensity Estimation Through Convex Combinations of Densities: Approximation and Estimation Bounds.” Neural Networks: The Official Journal of the International Neural Network Society 10 (1): 99–109.
Zellner, Arnold. 1988. β€œOptimal Information Processing and Bayes’s Theorem.” The American Statistician 42 (4): 278–80.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.