Causal inference on DAGs

Confounding! This scientist performed miracle graph surgery during an intervention and you won’t believe what happened next

Inferring the optimal intervention requires accounting for which arrows are independent of which

Inferring cause and effect from nature, especially from observational (as opposed to ideal experimental) data where it is hard. Graphical models and related techniques for doing it. Avoiding the danger of folk statistics. Observational studies, confounding, adjustment criteria, d-separation, identifiability, interventions, moral equivalence…

The gold standard, of course, is to work out if A causes B by doing an experiment where no input but A changes, then observing B. Statistically it is nearly as good to do the experiment where all other influences apart from A are at least uncorrelated with A.

In many circumstances though, (budget restrictions, ethical constraints, bad experimental design…) we cannot do these ideal experiments, and a mathematical crutch is needed to get us the next-best outcome.

The most well-trodden path in this circumstance is using directed graphical models with the additional assumption that \(A\rightarrow B\) may be read as “A causes a change in B”. C&C instrumental variables and propensity score matching. When you are talking Structural Equation models, this boils down to more or less some extra interpretation imposed on hierarchical models. Avoidance of Ecological fallacy/ Simpson’s paradox.

When can I use my crappy observational data, collected without a good experimental design for whatever reason, to do interventional inference? There is a lot of research in this area; I should summarise the salient bits for myself. In fact I did; I led a reading group.

See also quantum causal graphical models, and the use of classical causal graphical models to eliminate hidden quantum causes.

Spurious correlation induced by sampling bias.

Learning materials

Miguel Hernán and Jamie Robins’ new causal inference book, has a free draft online. See Yanir Seroussi’s review. Jonas Peters’ notes from his teaching in 2015 (I may have taken this course; can’t recall exactly).

Samantha Kleinberg has a book notable for its handling for time-dependent causality.

Tutorial: David Sontag and Uri Shalit, Causal inference from observational studies.

A resource list for causality in statistics, data science and physics.

Gwern on Causality:

I speculate that in realistic causal networks or DAGs, the number of possible correlations grows faster than the number of possible causal relationships. So confounds really are that common, and since people do not think in DAGs, the imbalance also explains overconfidence.

Lord’s paradox.

Felix Elwert’s summary. (Elwert 2013)

Chapter 3 of (some edition of) Pearl’s book is available as an author’s preprint: Part 1, 2, 3, 4, 5, 6.

Stanford encyclopaedia of philosophy entry.

Various classic introductions (Pearl 2012, 1998; Elwert 2013; Morgan and Winship 2015; Rohrer 2018). Notably not recommended on pedagogic grounds (Koller and Friedman 2009).

The dagitty intro is an interactive guide via visualizations. Likewise, the ggdag bias structure vignette shows of the useful explanation diagrams available in ggdag and is also a good introduction to selection bias and causal dags themselves.

Amit Sharma’s tutorial at KDD. See also Causal design patterns for data analysts | Emily Riederer


Pearl’s do calculus

In modern machine learning

Cunning modern nonparametric approaches such as Künzel et al. (2019) are covered in the causality notebook.

Continuously indexed fields

More generally that the typical framing where we have a few distinc varaibles joined by arrows of inference, we might be concerned with continuously indexed random fields.

External validity

See external validity.

Potential outcomes approach

A.k.a. Neyman-Rubin school. See potential outcomes.

Inferring a causal graph from data

Uh oh. You don’t know what causes what? Or specifically, you can’t eliminate a whole bunch of potential causal arrows a priori? Much more work.

Here is a seminar I noticed on this theme, which is also a lightspeed introduction to some difficulties.

Guido Consonni, Objective Bayes Model Selection of Gaussian Essential Graphs with Observational and Interventional Data.

Graphical models based on Directed Acyclic Graphs (DAGs) represent a powerful tool for investigating dependencies among variables. It is well known that one cannot distinguish between DAGs encoding the same set of conditional independencies (Markov equivalent DAGs) using only observational data. However, the space of all DAGs can be partitioned into Markov equivalence classes, each being represented by a unique Essential Graph (EG), also called Completed Partially Directed Graph (CPDAG). In some fields, in particular genomics, one can have both observational and interventional data, the latter being produced after an exogenous perturbation of some variables in the system, or from randomized intervention experiments. Interventions destroy the original causal structure, and modify the Markov property of the underlying DAG, leading to a finer partition of DAGs into equivalence classes, each one being represented by an Interventional Essential Graph (I-EG) (Hauser and Buehlmann). In this talk we consider Bayesian model selection of EGs under the assumption that the variables are jointly Gaussian. In particular, we adopt an objective Bayes approach, based on the notion of fractional Bayes factor, and obtain a closed form expression for the marginal likelihood of an EG. Next we construct a Markov chain to explore the EG space under a sparsity constraint, and propose an MCMC algorithm to approximate the posterior distribution over the space of EGs. Our methodology, which we name Objective Bayes Essential graph Search (OBES), allows to evaluate the inferential uncertainty associated to any features of interest, for instance the posterior probability of edge inclusion. An extension of OBES to deal simultaneously with observational and interventional data is also presented: this involves suitable modifications of the likelihood and prior, as well as of the MCMC algorithm.

Causal time series

As with other time series methods, has its own issues.

🏗 find out how Causal impact works. (Based on Brodersen et al. (2015).)

The CausalImpact R package implements an approach to estimating the causal effect of a designed intervention on a time series. For example, how many additional daily clicks were generated by an advertising campaign? Answering a question like this can be difficult when a randomized experiment is not available. The package aims to address this difficulty using a structural Bayesian time-series model to estimate how the response metric might have evolved after the intervention if the intervention had not occurred.

More generally, we might be concerned with continuous time.

Drawing graphical models

See diagramming graphical models.


Many. See, e.g. CausalDiscoveryToolbox, ijmbarr/causalgraphicalmodels: Causal Graphical Models in Python dagR does R. Recent and backed by Microsoft, DoWhy is a python toolbox.

Simpson’s paradox

Simpson’s paradox is an evergreen example of the importance of that causal graph. For a beautiful and clear example see Allen Downey’s Simpson’s Paradox and Age Effects.


