Probabilistic neural nets

Bayesian and other probabilistic inference in overparameterized ML

January 11, 2017 — April 27, 2023

Bayes
convolution
density
likelihood free
machine learning
neural nets
nonparametric
sparser than thou
uncertainty

Inferring densities and distributions in a massively parameterized deep learning setting in a Bayesian manner. Probabilistic networks are more general than Bayes.

Jospin et al. (2022) is a modern high-speed introduction and summary of many approaches.

Radford Neal’s thesis (Neal 1996) is a foundational Bayesian use of neural networks in the wide NN and MCMC sampling settings. Diederik P. Kingma’s thesis is a blockbuster in the more recent variational tradition.

Figure 1

Alex Graves’ poster of his paper (Graves 2011) presents a simple prior uncertainty method for recurrent nets — (diagonal Gaussian weight uncertainty) that I found elucidating. (There is a 3rd party quick and dirty implementation.)

One could refer to the 2019 NeurIPS Bayes deep learning workshop site, which introduced some more modern positioning.

Generative methods are useful, e.g. the variational autoencoder and affiliated reparameterization trick. Likelihood free methods seem to be in the air too.

We are free to consider classic neural network inference as a special case of Bayes inference. Specifically, we interpret the loss function \(\mathcal{L}\) of a net \(f:\mathbb{R}^n\times\mathbb{R}^d\to\mathbb{R}^k\) in the likelihood setting:

\[ \begin{aligned} \mathcal{L}(\theta) &:=-\sum_{i=1}^{m} \log p\left(y_{i} \mid f\left(x_{i} ; \theta\right)\right)-\log p(\theta) \\ &=-\log p(\theta \mid \mathcal{D}). \end{aligned} \]

Obviously, a few things are different from the point-estimate case; the parameter vector \(\theta\) is not interpretable, so what do posterior distributions over it even mean? What are sensible priors? Choosing priors over by-design-uninterpretable parameters such as NN weights is a fraught issue we will mostly ignore for now. Usually, a prior is by default something like

\[ p(\theta)=\mathcal{N}\left(0, \lambda^{-1} I\right) \]

for want of a better idea. This ends up being equivalent to the “weight decay” regularization in the sense that Bayesian priors and regularizations often are.

With that basis, we could do the usual stuff for Bayes inference, like considering the predictive posterior:

\[ p(y \mid x, \mathcal{D})=\int p(y \mid f(x ; \theta)) p(\theta \mid \mathcal{D}) d \theta \]

Usually, this posterior turns out to be intractable to calculate in the very high-dimensional parameter spaces of NNs, so we choose something simpler. We could summarize our posterior update by the simple maximum a posteriori estimate:

\[ \theta_{\mathrm{MAP}}:=\operatorname{arg min}_{\theta} \mathcal{L}(\theta). \]

In this case, we have recovered the classic training of non-Bayes nets with some ad hoc regularization which we claim was secretly a prior. But we have no notion of predictive uncertainty if we stop there.

Usually, the model will possess many optima, leading to suspicion that we have not found a good global one. How do we maximize model evidence here in any case?

Somewhere between the full belt-and-braces Bayes approach and the MAP point estimate are various approximations to Bayes inference we might try. What follows is a non-exhaustive smörgåsbord of options to do probabilistic inference in neural nets with different trade-offs.

🏗 To discuss: so many options for predictive uncertainty, but fewer for inverse uncertainty.

1 Natural Posterior Network

borchero/natural-posterior-network (Charpentier et al. 2022): some kind of reparameterization uncertainty?

2 MC sampling of weights by low-rank Matheron updates

This uses GP Matheron updates. Needs a shorter name but looks cool (Ritter et al. 2021). The idea is that we keep weights random but then create a sparse representation of the weights.

microsoft/bayesianize: Bayesianize: A Bayesian neural network wrapper in PyTorch.

  • Mean-field variational inference (MFVI): variational inference with fully factorized Gaussian (FFG) approximation.
  • Variational inference with full-covariance Gaussian approximation (for each layer).
  • Variational inference with inducing weights: each layer is augmented with a small matrix of inducing weights, then MFVI is performed in the inducing weight space.
  • Ensemble in inducing weight space: same augmentation as above, but with ensembles in the inducing weight space.

3 Bayes by backprop

See Bayes by backprop.

4 Variational autoencoders

See variational autoencoders.

5 Sampling via Monte Carlo

TBD. For now, if the number of parameters is smallish, see Hamiltonian Monte Carlo.

6 Laplace approximation

See Laplace approximations. AlexImmer/Laplace: Laplace approximations for Deep Learning.

7 Via random projections

I do not have a single paper about this, but I have seen random projection pop up as a piece of the puzzle in other methods. TBC.

8 In Gaussian process regression

See kernel learning.

9 Via measure transport

See reparameterization.

10 Via infinite-width random nets

See wide NN.

11 Via NTK

How does this work? He, Lakshminarayanan, and Teh (2020).

12 Ensemble methods

Figure 2

Deep learning has its own variants model averaging and bagging: Neural ensembles. Yarin Gal’s PhD Thesis (Gal 2016) summarizes some implicit approximate approaches (e.g. the Bayesian interpretation of dropout), although dropout, as he frames it, has been contested these days as a means of inference.

13 Neural GLM

I think this has a sparse Bayes flavor. M.-N. Tran et al. (2019) seems to randomize over input params?

14 Practicalities

The computational toolsets for “neural” probabilistic programming and vanilla probabilistic programming are converging. See the tool listing under probabilistic programming.

15 Stochastic Gradient Descent as MC inference

See MCMC by SGD.

16 Khan and Rue’s Bayes Learning Rule

Bayes via natural gradient (Khan and Rue 2024; Zellner 1988).

We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton’s method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.

17 Incoming

Seminars! Laplace’s Demon: A Seminar Series about Bayesian Machine Learning at Scale

Dustin Tran’s uncertainty layers (D. Tran et al. 2018):

In our work, we extend layers to capture “distributions over functions”, which we describe as a layer with uncertainty about some state in its computation — be it uncertainty in the weights, pre-activation units, activations, or the entire function. Each sample from the distribution instantiates a different function, e.g., a layer with a different weight configuration.…

While the framework we laid out so far tightly integrates deep Bayesian modelling into existing ecosystems, we have deliberately limited our scope. In particular, our layers tie the model specification to the inference algorithm (typically, variational inference). Bayesian Layers’ core assumption is the modularization of inference per layer. This makes inference procedures which depend on the full parameter space, such as Markov chain Monte Carlo, difficult to fit within the framework.

18 References

Abbasnejad, Dick, and Hengel. 2016. Infinite Variational Autoencoder for Semi-Supervised Learning.” In Advances in Neural Information Processing Systems 29.
Alexanderian. 2021. Optimal Experimental Design for Infinite-Dimensional Bayesian Inverse Problems Governed by PDEs: A Review.” arXiv:2005.12998 [Math].
Alexanderian, Petra, Stadler, et al. 2016. A Fast and Scalable Method for A-Optimal Design of Experiments for Infinite-Dimensional Bayesian Nonlinear Inverse Problems.” SIAM Journal on Scientific Computing.
Alexos, Boyd, and Mandt. 2022. Structured Stochastic Gradient MCMC.” In Proceedings of the 39th International Conference on Machine Learning.
Alquier. 2021. User-Friendly Introduction to PAC-Bayes Bounds.” arXiv:2110.11216 [Cs, Math, Stat].
———. 2023. User-Friendly Introduction to PAC-Bayes Bounds.”
Archer, Park, Buesing, et al. 2015. Black Box Variational Inference for State Space Models.” arXiv:1511.07367 [Stat].
Bao, Ye, Zang, et al. 2020. Numerical Solution of Inverse Problems by Weak Adversarial Networks.” Inverse Problems.
Baydin, Shao, Bhimji, et al. 2019. Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale.” In arXiv:1907.03382 [Cs, Stat].
Bazzani, Torresani, and Larochelle. 2017. “Recurrent Mixture Density Network for Spatiotemporal Visual Attention.”
Bishop, Christopher. 1994. Mixture Density Networks.” Microsoft Research.
Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. Information Science and Statistics.
Blundell, Cornebise, Kavukcuoglu, et al. 2015. Weight Uncertainty in Neural Networks.” In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. ICML’15.
Bora, Jalal, Price, et al. 2017. Compressed Sensing Using Generative Models.” In International Conference on Machine Learning.
Breslow, and Clayton. 1993. Approximate Inference in Generalized Linear Mixed Models.” Journal of the American Statistical Association.
Bui, Ravi, and Ramavajjala. 2017. Neural Graph Machines: Learning Neural Networks Using Graphs.” arXiv:1703.04818 [Cs].
Bunker, Girolami, Lambley, et al. 2024. Autoencoders in Function Space.”
Chada, and Tong. 2022. Convergence Acceleration of Ensemble Kalman Inversion in Nonlinear Settings.” Mathematics of Computation.
Charpentier, Borchert, Zügner, et al. 2022. Natural Posterior Network: Deep Bayesian Uncertainty for Exponential Family Distributions.” arXiv:2105.04471 [Cs, Stat].
Chen, Wilson Ye, Mackey, Gorham, et al. 2018. Stein Points.” In Proceedings of the 35th International Conference on Machine Learning.
Chen, Tian Qi, Rubanova, Bettencourt, et al. 2018. Neural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems 31.
Chu, Jin, Zhu, et al. 2022. DNA: Domain Generalization with Diversified Neural Averaging.” In Proceedings of the 39th International Conference on Machine Learning.
Cutajar, Bonilla, Michiardi, et al. 2017. Random Feature Expansions for Deep Gaussian Processes.” In PMLR.
Damianou, and Lawrence. 2013. Deep Gaussian Processes.” In Artificial Intelligence and Statistics.
Dandekar, Chung, Dixit, et al. 2021. Bayesian Neural Ordinary Differential Equations.” arXiv:2012.07244 [Cs].
Daxberger, Kristiadi, Immer, et al. 2021. Laplace Redux — Effortless Bayesian Deep Learning.” In arXiv:2106.14806 [Cs, Stat].
de Castro, and Dorigo. 2019. INFERNO: Inference-Aware Neural Optimisation.” Computer Physics Communications.
Dezfouli, and Bonilla. 2015. Scalable Inference for Gaussian Process Models with Black-Box Likelihoods.” In Advances in Neural Information Processing Systems 28. NIPS’15.
Doerr, Daniel, Schiegg, et al. 2018. Probabilistic Recurrent State-Space Models.” arXiv:1801.10395 [Stat].
Domingos. 2020. Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.” arXiv:2012.00152 [Cs, Stat].
Dunlop, Girolami, Stuart, et al. 2018. How Deep Are Deep Gaussian Processes? Journal of Machine Learning Research.
Dupont, Doucet, and Teh. 2019. Augmented Neural ODEs.” arXiv:1904.01681 [Cs, Stat].
Dusenberry, Jerfel, Wen, et al. 2020. Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors.” In Proceedings of the 37th International Conference on Machine Learning.
Dutordoir, Hensman, van der Wilk, et al. 2021. Deep Neural Networks as Point Estimates for Deep Gaussian Processes.” In arXiv:2105.04504 [Cs, Stat].
Dziugaite, and Roy. 2017. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters Than Training Data.” arXiv:1703.11008 [Cs].
Eleftheriadis, Nicholson, Deisenroth, et al. 2017. Identification of Gaussian Process State Space Models.” In Advances in Neural Information Processing Systems 30.
Fabius, and van Amersfoort. 2014. Variational Recurrent Auto-Encoders.” In Proceedings of ICLR.
Figurnov, Mohamed, and Mnih. 2018. Implicit Reparameterization Gradients.” In Advances in Neural Information Processing Systems 31.
Flaxman, Wilson, Neill, et al. 2015. “Fast Kronecker Inference in Gaussian Processes with Non-Gaussian Likelihoods.” In.
Flunkert, Salinas, and Gasthaus. 2017. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” arXiv:1704.04110 [Cs, Stat].
Foong, Li, Hernández-Lobato, et al. 2019. ‘In-Between’ Uncertainty in Bayesian Neural Networks.” arXiv:1906.11537 [Cs, Stat].
Fortuin. 2022. Priors in Bayesian Deep Learning: A Review.” International Statistical Review.
Gal. 2015. “Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference.” In Advances in Approximate Bayesian Inference Workshop, NIPS.
———. 2016. “Uncertainty in Deep Learning.”
Gal, and Ghahramani. 2015a. “On Modern Deep Learning and Variational Inference.” In Advances in Approximate Bayesian Inference Workshop, NIPS.
———. 2015b. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).
———. 2016a. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In arXiv:1512.05287 [Stat].
———. 2016b. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference.” In 4th International Conference on Learning Representations (ICLR) Workshop Track.
———. 2016c. Dropout as a Bayesian Approximation: Appendix.” arXiv:1506.02157 [Stat].
Gal, Hron, and Kendall. 2017. Concrete Dropout.” arXiv:1705.07832 [Stat].
Garnelo, Rosenbaum, Maddison, et al. 2018. Conditional Neural Processes.” arXiv:1807.01613 [Cs, Stat].
Garnelo, Schwarz, Rosenbaum, et al. 2018. Neural Processes.”
Gholami, Keutzer, and Biros. 2019. ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs.” arXiv:1902.10298 [Cs].
Giryes, Sapiro, and Bronstein. 2016. Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? IEEE Transactions on Signal Processing.
Gorad, Zhao, and Särkkä. 2020. “Parameter Estimation in Non-Linear State-Space Models by Automatic Differentiation of Non-Linear Kalman Filters.” In.
Gourieroux, Monfort, and Renault. 1993. Indirect Inference.” Journal of Applied Econometrics.
Graves. 2011. Practical Variational Inference for Neural Networks.” In Proceedings of the 24th International Conference on Neural Information Processing Systems. NIPS’11.
———. 2013. Generating Sequences With Recurrent Neural Networks.” arXiv:1308.0850 [Cs].
Graves, Mohamed, and Hinton. 2013. Speech Recognition with Deep Recurrent Neural Networks.” In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
Gregor, Danihelka, Graves, et al. 2015. DRAW: A Recurrent Neural Network For Image Generation.” arXiv:1502.04623 [Cs].
Gu, Ghahramani, and Turner. 2015. Neural Adaptive Sequential Monte Carlo.” In Advances in Neural Information Processing Systems 28.
Gu, Levine, Sutskever, et al. 2016. MuProp: Unbiased Backpropagation for Stochastic Neural Networks.” In Proceedings of ICLR.
Guo, Pleiss, Sun, et al. 2017. On Calibration of Modern Neural Networks.”
Gurevich, and Stuke. 2019. Gradient Conjugate Priors and Multi-Layer Neural Networks.”
Guth, Mojahed, and Sapsis. 2023. Evaluation of Machine Learning Architectures on the Quantification Of Epistemic and Aleatoric Uncertainties In Complex Dynamical Systems.” SSRN Scholarly Paper.
Haber, Lucka, and Ruthotto. 2018. Never Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.” arXiv:1805.08034 [Cs, Math].
Haykin, ed. 2001. Kalman Filtering and Neural Networks. Adaptive and Learning Systems for Signal Processing, Communications, and Control.
He, Lakshminarayanan, and Teh. 2020. Bayesian Deep Ensembles via the Neural Tangent Kernel.” In Advances in Neural Information Processing Systems.
Hoffman, and Blei. 2015. Stochastic Structured Variational Inference.” In PMLR.
Huggins, Campbell, Kasprzak, et al. 2018. Practical Bounds on the Error of Bayesian Posterior Approximations: A Nonasymptotic Approach.” arXiv:1809.09505 [Cs, Math, Stat].
Hu, Yang, Salakhutdinov, et al. 2018. On Unifying Deep Generative Models.” In arXiv:1706.00550 [Cs, Stat].
Immer, Bauer, Fortuin, et al. 2021. Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning.” In Proceedings of the 38th International Conference on Machine Learning.
Immer, Korzepa, and Bauer. 2021. Improving Predictions of Bayesian Neural Nets via Local Linearization.” In International Conference on Artificial Intelligence and Statistics.
Ingebrigtsen, Lindgren, and Steinsland. 2014. Spatial Models with Explanatory Variables in the Dependence Structure.” Spatial Statistics, Spatial Statistics Miami,.
Izmailov, Maddox, Kirichenko, et al. 2020. Subspace Inference for Bayesian Deep Learning.” In Proceedings of The 35th Uncertainty in Artificial Intelligence Conference.
Izmailov, Podoprikhin, Garipov, et al. 2018. Averaging Weights Leads to Wider Optima and Better Generalization.”
Jospin, Buntine, Boussaid, et al. 2022. Hands-on Bayesian Neural Networks — a Tutorial for Deep Learning Users.” arXiv:2007.06823 [Cs, Stat].
Karl, Soelch, Bayer, et al. 2016. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data.” In Proceedings of ICLR.
Khan, Immer, Abedi, et al. 2020. Approximate Inference Turns Deep Networks into Gaussian Processes.” arXiv:1906.01930 [Cs, Stat].
Khan, and Lin. 2017. Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models.” In Artificial Intelligence and Statistics.
Khan, and Rue. 2024. The Bayesian Learning Rule.”
Kingma, Diederik P. 2017. Variational Inference & Deep Learning: A New Synthesis.”
Kingma, Diederik P., Salimans, Jozefowicz, et al. 2016. Improving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29.
Kingma, Diederik P., and Welling. 2014. Auto-Encoding Variational Bayes.” In ICLR 2014 Conference.
Kovachki, and Stuart. 2019. Ensemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.” Inverse Problems.
Krauth, Bonilla, Cutajar, et al. 2016. AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In UAI17.
Krishnan, Shalit, and Sontag. 2015. Deep Kalman Filters.” arXiv Preprint arXiv:1511.05121.
Kristiadi, Hein, and Hennig. 2020. Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks.” In ICML 2020.
———. 2021a. An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence.” Advances in Neural Information Processing Systems.
———. 2021b. Learnable Uncertainty Under Laplace Approximations.” In Uncertainty in Artificial Intelligence.
———. 2022. Being a Bit Frequentist Improves Bayesian Neural Networks.” In CoRR.
Larsen, Sønderby, Larochelle, et al. 2015. Autoencoding Beyond Pixels Using a Learned Similarity Metric.” arXiv:1512.09300 [Cs, Stat].
Le, Baydin, and Wood. 2017. Inference Compilation and Universal Probabilistic Programming.” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS). Proceedings of Machine Learning Research.
Lee, Jaehoon, Bahri, Novak, et al. 2018. Deep Neural Networks as Gaussian Processes.” In ICLR.
Lee, Herbert K. H., Higdon, Bi, et al. 2002. Markov Random Field Models for High-Dimensional Parameters in Simulations of Fluid Flow in Porous Media.” Technometrics.
Le, Igl, Jin, et al. 2017. Auto-Encoding Sequential Monte Carlo.” arXiv Preprint arXiv:1705.10306.
Lindgren, and Rue. 2015. Bayesian Spatial Modelling with R-INLA.” Journal of Statistical Software.
Liu, Qiang, and Wang. 2019. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” In Advances In Neural Information Processing Systems.
Liu, Xiao, Yeo, and Lu. 2020. Statistical Modeling for Spatio-Temporal Data From Stochastic Convection-Diffusion Processes.” Journal of the American Statistical Association.
Lobacheva, Chirkova, and Vetrov. 2017. Bayesian Sparsification of Recurrent Neural Networks.” In Workshop on Learning to Generate Natural Language.
Long, Scavino, Tempone, et al. 2013. Fast Estimation of Expected Information Gains for Bayesian Experimental Designs Based on Laplace Approximations.” Computer Methods in Applied Mechanics and Engineering.
Lorsung. 2021. Understanding Uncertainty in Bayesian Deep Learning.”
Louizos, Shalit, Mooij, et al. 2017. Causal Effect Inference with Deep Latent-Variable Models.” In Advances in Neural Information Processing Systems 30.
Louizos, and Welling. 2016. Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors.” In arXiv Preprint arXiv:1603.04733.
———. 2017. Multiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In PMLR.
Mackay. 1992. A Practical Bayesian Framework for Backpropagation Networks.” Neural Computation.
MacKay. 2002. Information Theory, Inference & Learning Algorithms.
Maddison, Lawson, Tucker, et al. 2017. Filtering Variational Objectives.” arXiv Preprint arXiv:1705.09279.
Maddox, Garipov, Izmailov, et al. 2019. A Simple Baseline for Bayesian Uncertainty in Deep Learning.”
Mandt, Hoffman, and Blei. 2017. Stochastic Gradient Descent as Approximate Bayesian Inference.” JMLR.
Margossian, Vehtari, Simpson, et al. 2020. Hamiltonian Monte Carlo Using an Adjoint-Differentiated Laplace Approximation: Bayesian Inference for Latent Gaussian Models and Beyond.” arXiv:2004.12550 [Stat].
Martens, and Grosse. 2015. Optimizing Neural Networks with Kronecker-Factored Approximate Curvature.” In Proceedings of the 32nd International Conference on Machine Learning.
Matthews, Rowland, Hron, et al. 2018. Gaussian Process Behaviour in Wide Deep Neural Networks.” In arXiv:1804.11271 [Cs, Stat].
Matthews, van der Wilk, Nickson, et al. 2016. GPflow: A Gaussian Process Library Using TensorFlow.” arXiv:1610.08733 [Stat].
Molchanov, Ashukha, and Vetrov. 2017. Variational Dropout Sparsifies Deep Neural Networks.” In Proceedings of ICML.
Murphy. 2023. Probabilistic Machine Learning: Advanced Topics.
Neal. 1996. Bayesian Learning for Neural Networks.”
Ngiam, Chen, Koh, et al. 2011. Learning Deep Energy Models.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11).
Ngufor, van Houten, Caffo, et al. 2019. Mixed Effect Machine Learning: A Framework for Predicting Longitudinal Change in Hemoglobin A1c.” Journal of Biomedical Informatics.
Oakley, and Youngman. 2017. Calibration of Stochastic Computer Simulators Using Likelihood Emulation.” Technometrics.
Ober, and Rasmussen. 2019. Benchmarking the Neural Linear Model for Regression.” In.
Opitz, Huser, Bakka, et al. 2018. INLA Goes Extreme: Bayesian Tail Regression for the Estimation of High Spatio-Temporal Quantiles.” Extremes.
Ovadia, Fertig, Ren, et al. 2019. Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift.” In Proceedings of the 33rd International Conference on Neural Information Processing Systems.
Pan, Kuo, Rilee, et al. 2021. Assessing Deep Neural Networks as Probability Estimators.” arXiv:2111.08239 [Cs, Stat].
Papadopoulos, Edwards, and Murray. 2001. Confidence Estimation Methods for Neural Networks: A Practical Comparison.” IEEE Transactions on Neural Networks.
Papamakarios, Murray, and Pavlakou. 2017. Masked Autoregressive Flow for Density Estimation.” In Advances in Neural Information Processing Systems 30.
Papamarkou, Skoularidou, Palla, et al. 2024. Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI.”
Partee, Ringenburg, Robbins, et al. 2019. “Model Parameter Optimization: ML-Guided Trans-Resolution Tuning of Physical Models.” In.
Peluchetti, and Favaro. 2020. Infinitely Deep Neural Networks as Diffusion Processes.” In International Conference on Artificial Intelligence and Statistics.
Petersen, and Pedersen. 2012. The Matrix Cookbook.”
Piterbarg, and Fatalov. 1995. The Laplace Method for Probability Measures in Banach Spaces.” Russian Mathematical Surveys.
Psaros, Meng, Zou, et al. 2023. Uncertainty Quantification in Scientific Machine Learning: Methods, Metrics, and Comparisons.” Journal of Computational Physics.
Raissi, Perdikaris, and Karniadakis. 2019. Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations.” Journal of Computational Physics.
Rasmussen, and Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning.
Rezende, Danilo Jimenez, and Mohamed. 2015. Variational Inference with Normalizing Flows.” In International Conference on Machine Learning. ICML’15.
Rezende, Danilo J, Racanière, Higgins, et al. 2019. “Equivariant Hamiltonian Flows.” In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS).
Ritter, Botev, and Barber. 2018. A Scalable Laplace Approximation for Neural Networks.” In.
Ritter, and Karaletsos. 2022. TyXe: Pyro-Based Bayesian Neural Nets for Pytorch.” Proceedings of Machine Learning and Systems.
Ritter, Kukla, Zhang, et al. 2021. Sparse Uncertainty Representation in Deep Learning with Inducing Weights.” arXiv:2105.14594 [Cs, Stat].
Rue, Riebler, Sørbye, et al. 2016. Bayesian Computing with INLA: A Review.” arXiv:1604.00860 [Stat].
Ruiz, Titsias, and Blei. 2016. The Generalized Reparameterization Gradient.” In Advances In Neural Information Processing Systems.
Ryder, Golightly, McGough, et al. 2018. Black-Box Variational Inference for Stochastic Differential Equations.” arXiv:1802.03335 [Stat].
Sanchez-Gonzalez, Bapst, Battaglia, et al. 2019. “Hamiltonian Graph Networks with ODE Integrators.” In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS).
Saumard, and Wellner. 2014. Log-Concavity and Strong Log-Concavity: A Review.” arXiv:1404.5886 [Math, Stat].
Shi, Sun, and Zhu. 2018. A Spectral Approach to Gradient Estimation for Implicit Distributions.” In.
Sigrist, Künsch, and Stahel. 2015. Stochastic Partial Differential Equation Based Modelling of Large Space-Time Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Simchoni, and Rosset. 2023. Integrating Random Effects in Deep Neural Networks.”
Snoek, Rippel, Swersky, et al. 2015. Scalable Bayesian Optimization Using Deep Neural Networks.” In Proceedings of the 32nd International Conference on Machine Learning.
Solin, and Särkkä. 2020. Hilbert Space Methods for Reduced-Rank Gaussian Process Regression.” Statistics and Computing.
Sun, Zhang, Shi, et al. 2019. Functional Variational Bayesian Neural Networks.” In.
Tang, and Reid. 2021. Laplace and Saddlepoint Approximations in High Dimensions.” arXiv:2107.10885 [Math, Stat].
Thakur, Lorsung, Yacoby, et al. 2021. Uncertainty-Aware (UNA) Bases for Deep Bayesian Regression Using Multi-Headed Auxiliary Networks.” arXiv:2006.11695 [Cs, Stat].
Tran, Dustin, Dusenberry, van der Wilk, et al. 2018. Bayesian Layers: A Module for Neural Network Uncertainty.”
Tran, Dustin, Hoffman, Saurous, et al. 2017. Deep Probabilistic Programming.” In ICLR.
Tran, Dustin, Kucukelbir, Dieng, et al. 2016. Edward: A Library for Probabilistic Modeling, Inference, and Criticism.” arXiv:1610.09787 [Cs, Stat].
Tran, M.-N., Nguyen, Nott, et al. 2019. Bayesian Deep Net GLM and GLMM.” Journal of Computational and Graphical Statistics.
Tran, Ba-Hien, Rossi, Milios, et al. 2021. Model Selection for Bayesian Autoencoders.” In Advances in Neural Information Processing Systems.
Tran, Ba-Hien, Rossi, Milios, et al. 2022. All You Need Is a Good Functional Prior for Bayesian Deep Learning.” Journal of Machine Learning Research.
van den Berg, Hasenclever, Tomczak, et al. 2018. Sylvester Normalizing Flows for Variational Inference.” In UAI18.
Wacker. 2017. Laplace’s Method in Bayesian Inverse Problems.” arXiv:1701.07989 [Math].
Wainwright, and Jordan. 2005. “A Variational Principle for Graphical Models.” In New Directions in Statistical Signal Processing.
Watson, Lin, Klink, et al. 2020. “Neural Linear Models with Functional Gaussian Process Priors.” In.
Weber, Starc, Mittal, et al. 2018. Optimizing over a Bayesian Last Layer.” In NeurIPS Workshop on Bayesian Deep Learning.
Wei, and Lau. 2023. Variational Bayesian Neural Networks via Resolution of Singularities.” Journal of Computational and Graphical Statistics.
Wen, Tran, and Ba. 2020. BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning.” In ICLR.
Wenzel, Roth, Veeling, et al. 2020. How Good Is the Bayes Posterior in Deep Neural Networks Really? In Proceedings of the 37th International Conference on Machine Learning.
Wilson, and Izmailov. 2020. Bayesian Deep Learning and a Probabilistic Perspective of Generalization.”
Xu, and Darve. 2020. ADCME: Learning Spatially-Varying Physical Fields Using Deep Neural Networks.” In arXiv:2011.11955 [Cs, Math].
Yang, Li, and Wang. 2021. On the Capacity of Deep Generative Networks for Approximating Distributions.” arXiv:2101.12353 [Cs, Math, Stat].
Zeevi, and Meir. 1997. Density Estimation Through Convex Combinations of Densities: Approximation and Estimation Bounds.” Neural Networks: The Official Journal of the International Neural Network Society.
Zellner. 1988. Optimal Information Processing and Bayes’s Theorem.” The American Statistician.