Stan

The flagship Bayesian workhorse

October 19, 2020 — October 19, 2020

Bayes
how do science
Monte Carlo
statistics
Figure 1: Stan magically reducing posterior distribution surfaces to a smooth traversible manifold

Stan is the inference toolbox for broad classes of Bayesian inference, daaaaahling, implementing a probabilistic programming language with powerful MCMC posterior sampler. Frequently seen in concert with brms, which makes it easier to use in various standard regression models, but “bareback” is also pretty simple. It is reasonably flexible and powerful, but sometimes we want other hip stuff like flexible stochastic variational inference.

Baked-in documentation is extensive, but there is more. Andrew Gelman notes:

The basic execution structure of Stan is in the JSS paper (by Bob Carpenter, Andrew Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell) and in the reference manual. The details of autodiff are in the arXiv paper (by Bob Carpenter, Matt Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt). These are sort of background for what we’re trying to do.

If you haven’t read Maria Gorinova’s MS thesis and POPL paper (with Andrew Gordon and Charles Sutton), you should probably start there.

Radford Neal’s intro to HMC is nice, as is the one in David McKay’s book. Michael Betancourt’s papers are the thing to read to understand HMC deeply—he just wrote another brain bender on geometric autodiff (all on arXiv). Starting with the one on hierarchical models would be good as it explains the necessity of reparameterizations.

Also I recommend our JEBS paper (with Daniel Lee, and Jiqiang Guo) as it presents Stan from a user’s rather than a developer’s perspective.

And, for more general background on Bayesian data analysis, we recommend Statistical Rethinking by Richard McElreath and BDA3.

Those last two links have lots of worked examples using Stan.

Stan is a good tool that I use now and again and if it solves the problem then it is an unbeatable developer experience. It does so much automatically, and makes it easy to do the right thing on what is left. The boundaries around what it can do are sharp, however. Firstly, it cannot handle parameter domains that cannot be continuously mapped onto a cartesian product of scalar unbounded domains. The space of e.g. positive-definite matrices, or discrete parameters will only work if you can find a sweet hack to shoehorn them in. It has no generic foreign function interface, so you can’t plug external tools in, however, and if you want to do unorthodox variational inference it is not ideal. Since it is its own programming language, it is hard to work around this.

However, it is possible to implement models in C++, so for example, Google’s differentiable physics simulator claims their physics engine could be used to define a Stan model, and HMC inference would work.

1 References

Betancourt. 2017. A Conceptual Introduction to Hamiltonian Monte Carlo.” arXiv:1701.02434 [Stat].
———. 2018. The Convergence of Markov Chain Monte Carlo Methods: From the Metropolis Method to Hamiltonian Monte Carlo.” Annalen Der Physik.
Betancourt, Byrne, Livingstone, et al. 2017. The Geometric Foundations of Hamiltonian Monte Carlo.” Bernoulli.
Betancourt, Jordan, and Wilson. 2018. On Symplectic Optimization.” arXiv:1802.03653 [Stat].
Carpenter, Hoffman, Brubaker, et al. 2015. The Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” arXiv Preprint arXiv:1509.07164.
Gabry, Simpson, Vehtari, et al. 2019. Visualization in Bayesian Workflow.” Journal of the Royal Statistical Society: Series A (Statistics in Society).
Gelman, Lee, and Guo. 2015. Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization.” Journal of Educational and Behavioral Statistics.
Goodrich, Gelman, Hoffman, et al. 2017. Stan : A Probabilistic Programming Language.” Journal of Statistical Software.
Gorinova, Gordon, and Sutton. 2019. Probabilistic Programming with Densities in SlicStan: Efficient, Flexible and Deterministic.” Proceedings of the ACM on Programming Languages.