The flagship Bayesian workhorse

Stan magically reducing posterior distribution surfaces to a smooth traversible manifold

Stan is the inference toolbox for broad classes of Bayesian inference, daaaaahling. Usually seen in concert with brms, which makes it easier to use in various standard regression models, but “bareback” is also pretty simple.

Baked-in documentation is extensive, but there is more. Andrew Gelman notes:

The basic execution structure of Stan is in the JSS paper (by Bob Carpenter, Andrew Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell) and in the reference manual. The details of autodiff are in the arXiv paper (by Bob Carpenter, Matt Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt). These are sort of background for what we’re trying to do.

If you haven’t read Maria Gorinova’s MS thesis and POPL paper (with Andrew Gordon and Charles Sutton), you should probably start there.

Radford Neal’s intro to HMC is nice, as is the one in David McKay’s book. Michael Betancourt’s papers are the thing to read to understand HMC deeply—he just wrote another brain bender on geometric autodiff (all on arXiv). Starting with the one on hierarchical models would be good as it explains the necessity of reparameterizations.

Also I recommend our JEBS paper (with Daniel Lee, and Jiqiang Guo) as it presents Stan from a user’s rather than a developer’s perspective.

And, for more general background on Bayesian data analysis, we recommend Statistical Rethinking by Richard McElreath and BDA3.

Those last two links have lots of worked examples using Stan.

Stan is a really good tool that I use all the time and if it solves your problem then it is an unbeatable developer experience. It does so much automatically, and makes it easy to do the right thing on what is left. It has some sharp boundaries around what it can do, however. Firstly, it cannot handle parameter domains that cannot be continuously mapped onto a cartesian product of scalar unbounded domains. The space of e.g. positive-definite matrices, or discrete parameters will only work if you can find a sweet hack to shoehorn them in. It has no general foreign function interface, so you can’t plug external tools in, however, and if you want to do unorthodox variational inference it is not ideal. Since it is its own programming language, it is hard to work around this.

However, it is possible to implement models in C++, so for example, Google’s differentiable physics simulator claims their physics engine could be used to define a Stan model, and HMC inference would work.

Betancourt, Michael. 2017. “A Conceptual Introduction to Hamiltonian Monte Carlo.” January 9, 2017.
———. 2018. “The Convergence of Markov Chain Monte Carlo Methods: From the Metropolis Method to Hamiltonian Monte Carlo.” Annalen Der Physik, March.
Betancourt, Michael, Simon Byrne, Sam Livingstone, and Mark Girolami. 2017. “The Geometric Foundations of Hamiltonian Monte Carlo.” Bernoulli 23 (November): 2257–98.
Betancourt, Michael, Michael I. Jordan, and Ashia C. Wilson. 2018. “On Symplectic Optimization.” February 10, 2018.
Carpenter, Bob, Matthew D. Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. 2015. “The Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” 2015.
Gabry, Jonah, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. 2019. “Visualization in Bayesian Workflow.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 182 (2): 389–402.
Gelman, Andrew, Daniel Lee, and Jiqiang Guo. 2015. “Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization.” Journal of Educational and Behavioral Statistics 40 (5): 530–43.
Goodrich, Ben, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Bob Carpenter, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan : A Probabilistic Programming Language.” Journal of Statistical Software 76 (1).
Gorinova, Maria I., Andrew D. Gordon, and Charles Sutton. 2019. “Probabilistic Programming with Densities in SlicStan: Efficient, Flexible and Deterministic.” Proceedings of the ACM on Programming Languages 3 (January): 1–30.