Bayes for beginners

2016-05-30 — 2022-07-23

Suspiciously similar content

Even for the most curmudgeonly frequentist, it’s sometimes refreshing to shift your efforts from deriving frequentist estimators for intractable models to using the damn Bayesian ones, which fail in different and interesting ways than you’re used to. If it works and you’re feeling fancy, justify your Bayesian method on frequentist grounds, which washes away the sin.

Here are some scattered tidbits about getting into it. No attempt is made to be comprehensive, novel, or even expert.

1 Prior choice

Is weird and important. Here are some argumentative and disputed rules of thumb.

2 Teaching

2.1 Course material

So many! Too many. Actually, I kinda like McElreath’s stuff to teach from; you get practical quite quickly.

Bayesian Data Analysis course - GSU 2022
How to measure anything (Hubbard 2014)
The milieu around Andrew Gelman (Gelman, Hill, and Vehtari 2021; Gelman and Nolan 2017; Gelman et al. 2013). These are very good courses for the kind of statistics most people need, including people who think they need different statistics. Bayesian Data Analysis is online
McElreath (2020) is a cult textbook which various people have reimplemented in various languages. It’s remarkable how far this takes some very simple computational tools.
- StatisticalRethinkingJulia
- Statistical Rethinking in Numpyro
Cameron Davidson-Pilon, Probabilistic Programming & Bayesian Methods for Hackers (source) is an interesting one; does what it says on the tin. IMO McElreath is just a bit better, even for hackers, but this is cheaper and still a good start.
Chris Fonnesbeck’s workshop in R.
Intro to Stan for econometrics.

2.2 Worked examples

3 Linear regression

This workhorse pops up everywhere.

Deisenroth and Zafeiriou, Mathematics for Inference and Machine Learning give an ML perspective.

4 Workflow

If we want to use Bayesian tools to do science, there’s a principled workflow we need to be thinking about. For a fun rant, read Shalizi on Praxis and Ideology in Bayesian Data Analysis, about Gelman and Shalizi (2013).

The visualization how-to from, basically, the Stan team, is deeper than it sounds and highly recommended (Gabry et al. 2019).

Michael Betancourt’s examples, for example his workflow tips, are a good start for practical work, incorporating the inevitable collision of statistical and computational difficulties.

See also BAT the Bayesian Analysis Toolkit, which does sophisticated Bayes modelling although AFAICT uses a fairly basic sampler?

Notes on Rao-Blackwellization for doing faster MCMC inference, and even handling discrete parameters in Stan.

5 Nonparametrics

Dirichlet processes, Gaussian Process regression etc. 🏗

6 Tools

See probabilistic programming.

7 Applied

How to measure anything (Hubbard 2014).

8 As a methodology of science

Not quite.

9 Incoming

Everything You Always Wanted to Know About the Jeffreys-Lindley Paradox But Were Afraid to Ask

10 References

Alquier. 2021. “User-Friendly Introduction to PAC-Bayes Bounds.” arXiv:2110.11216 [Cs, Math, Stat].

Bacchus, Kyburg, and Thalos. 1990. “Against Conditionalization.” Synthese.

Barbier, and Macris. 2017. “The Stochastic Interpolation Method: A Simple Scheme to Prove Replica Formulas in Bayesian Inference.” arXiv:1705.02780 [Cond-Mat].

Bernardo, and Smith. 2000. Bayesian Theory.

Carpenter, Hoffman, Brubaker, et al. 2015. “The Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” arXiv Preprint arXiv:1509.07164.

Caruana. 1998. “Multitask Learning.” In Learning to Learn.

Deisenroth, and Zafeiriou. 2017. “Mathematics for Inference and Machine Learning.” Dept. Comput., Imperial College London, London, UK, Tech. Rep., Accessed on Jul.

Diaconis, and Ylvisaker. 1979. “Conjugate Priors for Exponential Families.” The Annals of Statistics.

Domingos. 2020. “Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.” arXiv:2012.00152 [Cs, Stat].

Fink. 1997. “A Compendium of Conjugate Priors.”

Gabry, Simpson, Vehtari, et al. 2019. “Visualization in Bayesian Workflow.” Journal of the Royal Statistical Society: Series A (Statistics in Society).

Gelman. 2006. “Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper).” Bayesian Analysis.

Gelman, Carlin, Stern, et al. 2013. Bayesian Data Analysis. Chapman & Hall/CRC texts in statistical science.

Gelman, Hill, and Vehtari. 2021. Regression and other stories.

Gelman, and Nolan. 2017. Teaching Statistics: A Bag of Tricks.

Gelman, and Rubin. 1995. “Avoiding Model Selection in Bayesian Social Research.” Sociological Methodology.

Gelman, and Shalizi. 2013. “Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology.

Gelman, and Yao. 2021. “Holes in Bayesian Statistics.” Journal of Physics G: Nuclear and Particle Physics.

Goodman, Mansinghka, Roy, et al. 2012. “Church: A Language for Generative Models.” arXiv:1206.3255.

Goodrich, Gelman, Hoffman, et al. 2017. “Stan : A Probabilistic Programming Language.” Journal of Statistical Software.

Howard. 1970. “Decision Analysis: Perspectives on Inference, Decision, and Experimentation.” Proceedings of the IEEE.

Hubbard. 2014. How to Measure Anything: Finding the Value of Intangibles in Business.

Khan, and Rue. 2024. “The Bayesian Learning Rule.”

Li, and Dunson. 2016. “A Framework for Probabilistic Inferences from Imperfect Models.” arXiv:1611.01241 [Stat].

Mackay. 1995. “Probable Networks and Plausible Predictions — a Review of Practical Bayesian Methods for Supervised Neural Networks.” Network: Computation in Neural Systems.

MacKay. 1999. “Comparison of Approximate Methods for Handling Hyperparameters.” Neural Computation.

Ma, Kording, and Goldreich. 2022. Bayesian Models of Perception and Action.

Mandt, Hoffman, and Blei. 2017. “Stochastic Gradient Descent as Approximate Bayesian Inference.” JMLR.

Martin, Frazier, and Robert. 2020. “Computing Bayes: Bayesian Computation from 1763 to the 21st Century.” arXiv:2004.06425 [Stat].

McElreath. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and STAN.

Nguyen, Low, and Jaillet. 2020. “Variational Bayesian Unlearning.” In Advances in Neural Information Processing Systems.

O’Hagan. 2010. Kendall’s Advanced Theory of Statistics: Bayesian Inference. Volume 2B.

Raftery. 1995. “Bayesian Model Selection in Social Research.” Sociological Methodology.

Robert. 2007. The Bayesian choice: from decision-theoretic foundations to computational implementation. Springer texts in statistics.

Schervish. 2012. Theory of Statistics. Springer Series in Statistics.

Stuart. 2010. “Inverse Problems: A Bayesian Perspective.” Acta Numerica.

van de Schoot, Depaoli, King, et al. 2021. “Bayesian Statistics and Modelling.” Nature Reviews Methods Primers.

van der Linden, and Chryst. 2017. “No Need for Bayes Factors: A Fully Bayesian Evidence Synthesis.” Frontiers in Applied Mathematics and Statistics.

Zellner. 1988. “Optimal Information Processing and Bayes’s Theorem.” The American Statistician.

———. 2002. “Information Processing and Bayesian Analysis.” Journal of Econometrics, Information and Entropy Econometrics,.