Bayes for beginners

Even for the most currmudgeonly frequentist it is sometimes refreshing to move your effort from deriving frequentist estimators for intractable models, to using the damn Bayesian ones, which fail in different and interesting ways than you are used to. If it works and you are feeling fancy you might then justify your Bayesian method on frequentist grounds, which washes away the sin.

Here are some scattered tidbits about getting into it. No attempt is made to be comprehensive, novel, or to even expert.


Everyone references Bayesian Data Analysis (free online, with copious learning notes) as a first stopping point. It is simple and readable.

The visualisation howto from, basically, the Stan team, is a deeper than it sounds. (Gabry et al. 2019)

Michael Betancourt’s examples, for example his workflow tips, are a good start.

Chris Fonnesbeck’s workshop in R.

Intro to Stan for econometrics.

See also BAT the Bayesian Analysis Toolkit, which does sophisticated Bayes modelling although AFAICT uses a fairly basic Sampler?

Notes on Rao-Blackwellisation for doing faster MCMC inference, and even handling discrete parameters in Stan.


Dirichlet processes, Gaussian Process regression etc. πŸ—


Alquier, Pierre. 2021. β€œUser-Friendly Introduction to PAC-Bayes Bounds.” arXiv:2110.11216 [Cs, Math, Stat], October.
Bacchus, F, H E Kyburg, and M Thalos. 1990. β€œAgainst Conditionalization.” Synthese 85 (3): 475–506.
Barbier, Jean, and Nicolas Macris. 2017. β€œThe Stochastic Interpolation Method: A Simple Scheme to Prove Replica Formulas in Bayesian Inference.” arXiv:1705.02780 [Cond-Mat], May.
Bernardo, JosΓ© M., and Adrian F. M. Smith. 2000. Bayesian Theory. 1 edition. Chichester: Wiley.
Carpenter, Bob, Matthew D. Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. 2015. β€œThe Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” arXiv Preprint arXiv:1509.07164.
Caruana, Rich. 1998. β€œMultitask Learning.” In Learning to Learn, 95–133. Springer, Boston, MA.
Domingos, Pedro. 2020. β€œEvery Model Learned by Gradient Descent Is Approximately a Kernel Machine.” arXiv:2012.00152 [Cs, Stat], November.
Fink, Daniel. 1997. β€œA Compendium of Conjugate Priors,” 46.
Gabry, Jonah, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. 2019. β€œVisualization in Bayesian Workflow.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 182 (2): 389–402.
Gelman, Andrew. 2006. β€œPrior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper).” Bayesian Analysis 1 (3): 515–34.
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. 3 edition. Chapman & Hall/CRC texts in statistical science. Boca Raton: Chapman and Hall/CRC.
Gelman, Andrew, and Donald B. Rubin. 1995. β€œAvoiding Model Selection in Bayesian Social Research.” Sociological Methodology 25: 165–73.
Goodman, Noah, Vikash Mansinghka, Daniel Roy, Keith Bonawitz, and Daniel Tarlow. 2012. β€œChurch: A Language for Generative Models.” arXiv:1206.3255, June.
Goodrich, Ben, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Bob Carpenter, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. β€œStan : A Probabilistic Programming Language.” Journal of Statistical Software 76 (1).
Howard, R.A. 1970. β€œDecision Analysis: Perspectives on Inference, Decision, and Experimentation.” Proceedings of the IEEE 58 (5): 632–43.
Hubbard, Douglas W. 2014. How to Measure Anything: Finding the Value of Intangibles in Business. 3 edition. Hoboken, New Jersey: Wiley.
Li, Meng, and David B. Dunson. 2016. β€œA Framework for Probabilistic Inferences from Imperfect Models.” arXiv:1611.01241 [Stat], November.
Linden, Sander van der, and Breanne Chryst. 2017. β€œNo Need for Bayes Factors: A Fully Bayesian Evidence Synthesis.” Frontiers in Applied Mathematics and Statistics 3.
ma, wei jin, Konrad Paul Kording, and Daniel Goldreich. n.d. Bayesian Models of Perception and Action.
Mackay, David J. C. 1995. β€œProbable Networks and Plausible Predictions β€” a Review of Practical Bayesian Methods for Supervised Neural Networks.” Network: Computation in Neural Systems 6 (3): 469–505.
MacKay, David JC. 1999. β€œComparison of Approximate Methods for Handling Hyperparameters.” Neural Computation 11 (5): 1035–68.
Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017. β€œStochastic Gradient Descent as Approximate Bayesian Inference.” JMLR, April.
Martin, Gael M., David T. Frazier, and Christian P. Robert. 2020. β€œComputing Bayes: Bayesian Computation from 1763 to the 21st Century.” arXiv:2004.06425 [Stat], December.
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and STAN. Boca Raton: CRC Press.
Raftery, Adrian E. 1995. β€œBayesian Model Selection in Social Research.” Sociological Methodology 25: 111–63.
Robert, Christian P. 2007. The Bayesian choice: from decision-theoretic foundations to computational implementation. 2nd ed. Springer texts in statistics. New York: Springer.
Schervish, Mark J. 2012. Theory of Statistics. Springer Series in Statistics. New York, NY: Springer Science & Business Media.
Schoot, Rens van de, Sarah Depaoli, Ruth King, Bianca Kramer, Kaspar MΓ€rtens, Mahlet G. Tadesse, Marina Vannucci, et al. 2021. β€œBayesian Statistics and Modelling.” Nature Reviews Methods Primers 1 (1): 1–26.
Stuart, A. M. 2010. β€œInverse Problems: A Bayesian Perspective.” Acta Numerica 19: 451–559.
Zellner, Arnold. 1988. β€œOptimal Information Processing and Bayes’s Theorem.” The American Statistician 42 (4): 278–80.
β€”β€”β€”. 2002. β€œInformation Processing and Bayesian Analysis.” Journal of Econometrics, Information and Entropy Econometrics, 107 (1): 41–50.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.