Bayes for beginners

May 30, 2016 — July 23, 2022

Bayes
generative
how do science
Monte Carlo
statistics
Figure 1

Even for the most curmudgeonly frequentist, it is sometimes refreshing to move your effort from deriving frequentist estimators for intractable models to using the damn Bayesian ones, which fail in different and interesting ways than you are used to. If it works and you are feeling fancy, you might then justify your Bayesian method on frequentist grounds, which washes away the sin.

Here are some scattered tidbits about getting into it. No attempt is made to be comprehensive, novel, or even expert.

1 Prior choice

Is weird and important. Here are some argumentative and disputed rules of thumb.

2 Teaching

2.1 Course material

So many! Too many. Actually, I kinda like McElreath’s stuff to teach from; you get practical quite quickly.

2.2 Worked examples

3 Linear regression

This workhorse pops up everywhere.

Deisenroth and Zafeiriou, Mathematics for Inference and Machine Learning give an ML perspective.

4 Workflow

If we want to use Bayesian tools to do science, there is a principled workflow that we need to be thinking about. For a fun rant, read Shalizi on Praxis and Ideology in Bayesian Data Analysis, about Gelman and Shalizi (2013).

The visualization how-to from, basically, the Stan team, is deeper than it sounds and highly recommended (Gabry et al. 2019).

Michael Betancourt’s examples, for example his workflow tips, are a good start for practical work, incorporating the inevitable collision of statistical and computational difficulties.

See also BAT the Bayesian Analysis Toolkit, which does sophisticated Bayes modelling although AFAICT uses a fairly basic sampler?

Notes on Rao-Blackwellization for doing faster MCMC inference, and even handling discrete parameters in Stan.

5 Nonparametrics

Dirichlet processes, Gaussian Process regression etc. 🏗

6 Tools

See probabilistic programming.

7 Applied

How to measure anything (Hubbard 2014).

8 As a methodology of science

Not quite.

9 Incoming

10 References

Alquier. 2021. User-Friendly Introduction to PAC-Bayes Bounds.” arXiv:2110.11216 [Cs, Math, Stat].
Bacchus, Kyburg, and Thalos. 1990. “Against Conditionalization.” Synthese.
Barbier, and Macris. 2017. The Stochastic Interpolation Method: A Simple Scheme to Prove Replica Formulas in Bayesian Inference.” arXiv:1705.02780 [Cond-Mat].
Bernardo, and Smith. 2000. Bayesian Theory.
Carpenter, Hoffman, Brubaker, et al. 2015. The Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” arXiv Preprint arXiv:1509.07164.
Caruana. 1998. Multitask Learning.” In Learning to Learn.
Deisenroth, and Zafeiriou. 2017. “Mathematics for Inference and Machine Learning.” Dept. Comput., Imperial College London, London, UK, Tech. Rep., Accessed on Jul.
Diaconis, and Ylvisaker. 1979. Conjugate Priors for Exponential Families.” The Annals of Statistics.
Domingos. 2020. Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.” arXiv:2012.00152 [Cs, Stat].
Fink. 1997. A Compendium of Conjugate Priors.”
Gabry, Simpson, Vehtari, et al. 2019. Visualization in Bayesian Workflow.” Journal of the Royal Statistical Society: Series A (Statistics in Society).
Gelman. 2006. Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper).” Bayesian Analysis.
Gelman, Carlin, Stern, et al. 2013. Bayesian Data Analysis. Chapman & Hall/CRC texts in statistical science.
Gelman, Hill, and Vehtari. 2021. Regression and other stories.
Gelman, and Nolan. 2017. Teaching Statistics: A Bag of Tricks.
Gelman, and Rubin. 1995. Avoiding Model Selection in Bayesian Social Research.” Sociological Methodology.
Gelman, and Shalizi. 2013. Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology.
Gelman, and Yao. 2021. Holes in Bayesian Statistics.” Journal of Physics G: Nuclear and Particle Physics.
Goodman, Mansinghka, Roy, et al. 2012. Church: A Language for Generative Models.” arXiv:1206.3255.
Goodrich, Gelman, Hoffman, et al. 2017. Stan : A Probabilistic Programming Language.” Journal of Statistical Software.
Howard. 1970. Decision Analysis: Perspectives on Inference, Decision, and Experimentation.” Proceedings of the IEEE.
Hubbard. 2014. How to Measure Anything: Finding the Value of Intangibles in Business.
Khan, and Rue. 2024. The Bayesian Learning Rule.”
Li, and Dunson. 2016. A Framework for Probabilistic Inferences from Imperfect Models.” arXiv:1611.01241 [Stat].
Mackay. 1995. Probable Networks and Plausible Predictions — a Review of Practical Bayesian Methods for Supervised Neural Networks.” Network: Computation in Neural Systems.
MacKay. 1999. Comparison of Approximate Methods for Handling Hyperparameters.” Neural Computation.
Ma, Kording, and Goldreich. 2022. Bayesian Models of Perception and Action.
Mandt, Hoffman, and Blei. 2017. Stochastic Gradient Descent as Approximate Bayesian Inference.” JMLR.
Martin, Frazier, and Robert. 2020. Computing Bayes: Bayesian Computation from 1763 to the 21st Century.” arXiv:2004.06425 [Stat].
McElreath. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and STAN.
Nguyen, Low, and Jaillet. 2020. Variational Bayesian Unlearning.” In Advances in Neural Information Processing Systems.
O’Hagan. 2010. Kendall’s Advanced Theory of Statistics: Bayesian Inference. Volume 2B.
Raftery. 1995. Bayesian Model Selection in Social Research.” Sociological Methodology.
Robert. 2007. The Bayesian choice: from decision-theoretic foundations to computational implementation. Springer texts in statistics.
Schervish. 2012. Theory of Statistics. Springer Series in Statistics.
Stuart. 2010. Inverse Problems: A Bayesian Perspective.” Acta Numerica.
van de Schoot, Depaoli, King, et al. 2021. Bayesian Statistics and Modelling.” Nature Reviews Methods Primers.
van der Linden, and Chryst. 2017. No Need for Bayes Factors: A Fully Bayesian Evidence Synthesis.” Frontiers in Applied Mathematics and Statistics.
Zellner. 1988. Optimal Information Processing and Bayes’s Theorem.” The American Statistician.
———. 2002. Information Processing and Bayesian Analysis.” Journal of Econometrics, Information and Entropy Econometrics,.