Probabilistic programming

Programming with probability distributions for, e.g. Bayesian inference

Programming with distributions over code branches, where the goal is to estimate the probability distribution of a certain output of interest. More specifically, what I usually use probabilistic programming for is Bayesian inference, and I think this is a common enough use that it is generally assumed. The program represents our random generative model, and conditioning upon the observed data gives us updated distributions over parameters, or prediction, or whatever.

By convention, when we say Bayesian programming rather than merely inference, there is an implied focus; I am indicating hope that my technique might succeed in doing inference for very complicated models indeed, possibly ones without tractable likelihoods of any kind, maybe even Turing-complete models. Usually I would not say this for a garden-variety hierarchical model, because that is not ambitious enough. Hope in this context means something like “we have the programming primitives to, in principle, possibly approximately, express the awful crazy likelihood structure of a complicated problem, and to do something that looks like it might estimate the correct conditional density, but also for any given problem we are on our own in demonstrating that it actually does solve the desired problem to the desired accuracy int he desired time.”

Mostly these tools use Markov Chain Monte Carlo sampling, which turns out to be a startlingly general way to grind out estimates of the necessary conditional probability distributions, especially if we don’t think about convergence rates too hard. Some frameworks enable other methods, such as classic conjugate priors for easy (sub-)models, variational methods of all stripes, including reparameterisation flows, and many hybrids of all of the above.

See George Ho of PyMC3/PyMC4 for an in-depth introduction into what might be desirable to solve these problems in practice.

A probabilistic programming framework needs to provide six things:

  1. A language or API for users to specify a model
  2. A library of probability distributions and transformations to build the posterior density
  3. At least one inference algorithm, which either draws samples from the posterior (in the case of Markov Chain Monte Carlo, MCMC) or computes some approximation of it (in the case of variational inference, VI)
  4. At least one optimizer, which can compute the mode of the posterior density
  5. An autodifferentiation library to compute gradients required by the inference algorithm and optimizer
  6. A suite of diagnostics to monitor and analyze the quality of inference

George Ho diagrams probabilistic programming frameworks

See also Col Carroll’s overview of several trendy frameworks. This includes some I did not include here due to exhaustion and choice paralysis. Check the dates on all these; as a hip research area, there is a constant flux of new frameworks into and out of use.

For general introductions to how we might think about this, Kevin Murphy’s books are popular. They are good at providing exhaustive details on particular applied models.

I found them a little confusing because the theoretical background takes a back seat; (Barber 2012; Bishop 2006) made it all much clearer to me.


I’ve seen funsors mentioned in this context. I gether they are some kind of graphical model-inference abstraction. Do they do anything useful? Obermeyer et al. (2020):

It is a significant challenge to design probabilistic programming systems that can accommodate a wide variety of inference strategies within a unified framework. Noting that the versatility of modern automatic differentiation frameworks is based in large part on the unifying concept of tensors, we describe a software abstraction for integration —functional tensors— that captures many of the benefits of tensors, while also being able to describe continuous probability distributions. Moreover, functional tensors are a natural candidate for generalized variable elimination and parallel-scan filtering algorithms that enable parallel exact inference for a large family of tractable modeling motifs.

…This property is extensively exploited by the Pyro probabilistic programming language (Pradhan et al. 2018) and its implementation of tensor variable elimination for exact inference in discrete latent variable models, in which each random variable in a model is associated with a distinct tensor dimension and broadcasting is used to compile a probabilistic program into a discrete factor graph. Functional tensors (hereafter “funsors”) both formalize and extend this seemingly idiosyncratic but highly successful approach to probabilistic program compilation by generalizing tensors and broadcasting to allow free variables of non-integer types that appear in probabilistic models, such as real number, real-valued vector, or real-valued matrix. Building on this, we describe a simple language of lazy funsor expressions that can serve as a unified intermediate representation for a wide variety of probabilistic programs and inference algorithms. While in general there is no finite representation of functions of real variables, we provide a funsor interface for restricted classes of functions,including lazy algebraic expressions, non-normalized Gaussian functions, and Dirac delta distributions.

Sounds like this lands not to far from message passing ideas?

MCMC considerations

Maybe see MCMC for now.

Variation inference considerations

Maybe see variational inference for now.



Stan is the inference toolbox for broad classes of Bayesian model and the de facto reference point. If your problem CAN be handled by Stan, this is a highly recommended option. Often seen in concert with brms which makes it easier to use for various standard regression models.

Stan breaks down in certain circumstances. It does not naturally express neural-network models well, and indeed we have reason to be concerned that the posterior simulations will be nasty with very high dimensional parameter vectors

Stan does support some variational inference, although last time I checked (2017) it was insufficiently flexible todo anything useful for me and not recommended.

See the Stan notebook.

Typical posterior density landscape


pytorch + bayes = pyro. (Pradhan et al. 2018) For rationale, see the pyro launch announcement:

We believe the critical ideas to solve AI will come from a joint effort among a worldwide community of people pursuing diverse approaches. By open sourcing Pyro, we hope to encourage the scientific world to collaborate on making AI tools more flexible, open, and easy-to-use. We expect the current (alpha!) version of Pyro will be of most interest to probabilistic modelers who want to leverage large data sets and deep networks, PyTorch users who want easy-to-use Bayesian computation, and data scientists ready to explore the ragged edge of new technology.

As a friendly, well-documented framework without the designed-during-interdepartmental-turf-war feel of the tensorflow frameworks, this is a good default option.


Numpyro is an alternative version of pyro whcih uses [jax]./jax.html) for autodiff. In line with the general jax aesthetic it is elegant, fast, badly documented and missing some conveniences.


From Blei’s lab, leverages trendy deep learning machinery, tensorflow for variational Bayes and such.

This is now baked in to tensorflow as a probabilistic programming interface.

TensorFlow Probability

Another Tensorflow entrant. Low-level and messy. Used in Edward2, above, but presumably more basic. The precise relationships between these tensorflow things is complicated enough that it is a whole other research project to pick it apart.


pyprob: (Le, Baydin, and Wood 2017)

pyprob is a PyTorch-based library for probabilistic programming and inference compilation. The main focus of this library is on coupling existing simulation codebases with probabilistic inference with minimal intervention.

The main advantage of pyprob, compared against other probabilistic programming languages like Pyro, is a fully automatic amortized inference procedure based on importance sampling. pyprob only requires a generative model to be specified. Particularly, pyprob allows for efficient inference using inference compilation which trains a recurrent neural network as a proposal network.

In Pyro such an inference network requires the user to explicitly define the control flow of the network, which is due to Pyro running the inference network and generative model sequentially. However, in pyprob the generative model and inference network runs concurrently. Thus, the control flow of the model is directly used to train the inference network. This alleviates the need for manually defining its control flow.

The flagship application seems to be etalumis (Baydin et al. 2019) a probablistic programming framework with emphasis AFAICT on Bayesian inverse problems.



Turing.jl is a Julia library for (universal) probabilistic programming. Current features include:

  • Universal probabilistic programming with an intuitive modelling interface
  • Hamiltonian Monte Carlo (HMC) sampling for differentiable posterior distributions
  • Particle MCMC sampling for complex posterior distributions involving discrete variables and stochastic control flows
  • Gibbs sampling that combines particle MCMC and HMC

It is one of many julia options, and includes MCMC toolkit AdvancedHMC.jl


The PyMC family creates many probabilistic programming ideas and blogposts and also code, and has been doing so since the mid 2000s. They seem an excellent destination to learn about probabilistic programming, although not the best place to find stable, finished products, even by the mercurial standards of this field.

PyMC3 is python+Theano, although they have ported theano to jax and renamed it Aesara. They claim this is fast and it might be an easy way to access jax-accelerated sampling if Numpyro feels too exhausting.1

See Chris Fonnesbeck’s example in python.

Thomas Wiecki, Bayesian Deep Learning demonstrates some variants with PyMC3.



Mamba is an open platform for the implementation and application of MCMC methods to perform Bayesian analysis in julia. The package provides a framework for (1) specification of hierarchical models through stated relationships between data, parameters, and statistical distributions; (2) block-updating of parameters with samplers provided, defined by the user, or available from other packages; (3) execution of sampling schemes; and (4) posterior inference. It is intended to give users access to all levels of the design and implementation of MCMC simulators to particularly aid in the development of new methods.

Several software options are available for MCMC sampling of Bayesian models. Individuals who are primarily interested in data analysis, unconcerned with the details of MCMC, and have models that can be fit in JAGS, Stan, or OpenBUGS are encouraged to use those programs. Mamba is intended for individuals who wish to have access to lower-level MCMC tools, are knowledgeable of MCMC methodologies, and have experience, or wish to gain experience, with their application. The package also provides stand-alone convergence diagnostics and posterior inference tools, which are essential for the analysis of MCMC output regardless of the software used to generate it.



Gen simplifies the use of probabilistic modeling and inference, by providing modeling languages in which users express models, and high-level programming constructs that automate aspects of inference.

Like some probabilistic programming research languages, Gen includes universal modeling languages that can represent any model, including models with stochastic structure, discrete and continuous random variables, and simulators. However, Gen is distinguished by the flexibility that it affords to users for customizing their inference algorithm.

Gen’s flexible modeling and inference programming capabilities unify symbolic, neural, probabilistic, and simulation-based approaches to modeling and inference, including causal modeling, symbolic programming, deep learning, hierarchical Bayesiam modeling, graphics and physics engines, and planning and reinforcement learning.

It has an impressive talk demonstrating how you would interactively clean data using it.



greta models are written right in R, so there’s no need to learn another language like BUGS or Stan

greta uses Google TensorFlow

I wonder how it uses Google Tensorflow.



Soss is a library for probabilistic programming.

Let’s jump right in with a simple linear model:

using Soss

m = @model X begin
    β ~ Normal() |> iid(size(X,2))
    y ~ For(eachrow(X)) do x
        Normal(x’ * β, 1)

In Soss, models are first-class and function-like, and “applying” a model to its arguments gives a joint distribution.

Just a few of the things we can do in Soss:

  • Sample from the (forward) model
  • Condition a joint distribution on a subset of parameters
  • Have arbitrary Julia values (yes, even other models) as inputs or outputs of a model
  • Build a new model for the predictive distribution, for assigning parameters to particular values

How does it do all these things exactly?

Miscellaneous julia options

DynamicHMC.jl does Hamiltonian/NUTS sampling in a raw likelihood setting.

Miletus is a financial product and term-structure modeling package that is available for quant stuff in Julia as part of the paid packages offerings in finance. Although it looks like it is also freely available?


InferPy seems to be a higher-level competitor to Edward2?



ZhuSuan is a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning. ZhuSuan is built upon Tensorflow. Unlike existing deep learning libraries, which are mainly designed for deterministic neural networks and supervised tasks, ZhuSuan provides deep learning style primitives and algorithms for building probabilistic models and applying Bayesian inference. The supported inference algorithms include:

  • Variational inference with programmable variational posteriors, various objectives and advanced gradient estimators (SGVB, REINFORCE, VIMCO, etc.).
  • Importance sampling for learning and evaluating models, with programmable proposals.
  • Hamiltonian Monte Carlo (HMC) with parallel chains, and optional automatic parameter tuning.


Church is a general-purpose Turing-complete Monte Carlo lisp-derivative, which is unbearably slow but does some reputedly cute tricks with modeling human problem-solving, and other likelihood-free methods, according to creators Noah Goodman and Joshua Tenenbaum.

See also Anglican, which is the same but different, being built in clojure, and hence also leveraging browser Clojurescript.


WebPPL is a successor to Church designed as a teaching language for probabilistic reasoning in the browser. If you like Javascript ML.


See also BAT the Bayesian Analysis Toolkit, which does sophisticated Bayes modelling although AFAICT uses a fairly basic Metropolis-Hastings Sampler?


Barber, David. 2012. Bayesian Reasoning and Machine Learning. Cambridge ; New York: Cambridge University Press.
Baydin, Atılım Güneş, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, et al. 2019. “Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale.” In arXiv:1907.03382 [cs, Stat].
Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. Information Science and Statistics. New York: Springer.
Carroll, Colin. n.d. “A Tour of Probabilistic Programming Language APIs.” Https:// (blog).
Cusumano-Towner, Marco F., and Vikash K. Mansinghka. 2017. “Encapsulating Models and Approximate Inference Programs in Probabilistic Modules.” arXiv:1612.04759 [cs, Stat], May.
———. 2018. “Using Probabilistic Programs as Proposals.” arXiv:1801.03612 [cs, Stat], January.
Cusumano-Towner, Marco F., Feras A. Saad, Alexander K. Lew, and Vikash K. Mansinghka. 2019. “Gen: A General-Purpose Probabilistic Programming System with Programmable Inference.” In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 221–36. PLDI 2019. New York, NY, USA: ACM.
Cusumano-Towner, Marco, Benjamin Bichsel, Timon Gehr, Martin Vechev, and Vikash K. Mansinghka. 2018. “Incremental Inference for Probabilistic Programs.” In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 571–85. PLDI 2018. New York, NY, USA: ACM.
Cusumano-Towner, Marco, and Vikash K. Mansinghka. 2018. “A Design Proposal for Gen: Probabilistic Programming with Fast Custom Inference via Code Generation.” In Proceedings of the 2Nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 52–57. MAPL 2018. New York, NY, USA: ACM.
Gelman, Andrew, Daniel Lee, and Jiqiang Guo. 2015. “Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization.” Journal of Educational and Behavioral Statistics 40 (5): 530–43.
Goodrich, Ben, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Bob Carpenter, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan : A Probabilistic Programming Language.” Journal of Statistical Software 76 (1).
Gorinova, Maria I., Andrew D. Gordon, and Charles Sutton. 2019. “Probabilistic Programming with Densities in SlicStan: Efficient, Flexible and Deterministic.” Proceedings of the ACM on Programming Languages 3 (POPL): 1–30.
Kochurov, Max, Colin Carroll, Thomas Wiecki, and Junpeng Lao. 2019. “PyMC4: Exploiting Coroutines for Implementing a Probabilistic Programming Framework,” September.
Lao, Junpeng. 2019. “A Hitchhiker’s Guide to Designing a Bayesian Library in Python.” Presentation Slides presented at the PyData Córdoba, Córdoba, Argentina, September 29.
Le, Tuan Anh, Atılım Güneş Baydin, and Frank Wood. 2017. “Inference Compilation and Universal Probabilistic Programming.” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 54:1338–48. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR.
Moore, Dave, and Maria I. Gorinova. 2018. “Effect Handling for Composable Program Transformations in Edward2.” arXiv:1811.06150 [cs, Stat], November.
Murphy, Kevin P. 2012. Machine learning: a probabilistic perspective. 1 edition. Adaptive computation and machine learning series. Cambridge, MA: MIT Press.
Obermeyer, Fritz, Eli Bingham, Martin Jankowiak, Du Phan, and Jonathan P. Chen. 2020. “Functional Tensors for Probabilistic Programming.” arXiv:1910.10775 [cs, Stat], March.
Pearl, Judea. 2008. Probabilistic reasoning in intelligent systems: networks of plausible inference. Rev. 2. print., 12. [Dr.]. The Morgan Kaufmann series in representation and reasoning. San Francisco, Calif: Kaufmann.
Pradhan, Neeraj, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Eli Bingham, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D. Goodman. 2018. “Pyro: Deep Universal Probabilistic Programming.” arXiv:1810.09538 [cs, Stat], October.
PyMC Development Team. 2019. “PyMC3 Developer Guide.”
Rainforth, Tom. 2017. “Automating Inference, Learning, and Design Using Probabilistic Programming.” PhD Thesis, University of Oxford.
Salvatier, John, Thomas V. Wiecki, and Christopher Fonnesbeck. 2016. “Probabilistic Programming in Python Using PyMC3.” PeerJ Computer Science 2 (April): e55.
Tran, Dustin, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. “Deep Probabilistic Programming.” In ICLR.
Tran, Dustin, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. “Edward: A Library for Probabilistic Modeling, Inference, and Criticism.” arXiv:1610.09787 [cs, Stat], October.
Vasudevan, Srinivas, Ian Langmore, Dustin Tran, Eugene Brevdo, Joshua V. Dillon, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, and Rif A. Saurous. 2017. “TensorFlow Distributions.” arXiv:1711.10604 [cs, Stat], November.

  1. PyMC4, despite what you might think due to the jetsam of an earlier hype cycle, is discontinued in favour of PyMC3. AFAICT PyMC4 was intended to be a tensorflow-backed system, so this is some additional evidence that Tensorflow blighs every probabilistic programming system it touches.↩︎

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.