My chunk (Chapter 3) of the internal reading group covering the Brady Neal course.
- Book title: Introduction to Causal Inference (Neal 2020)
- Book link (first 10 Chapters only)
- Course link (with slides and videos)
- This week’s bonus reading: Pearl (2018).
- See also
- previous versions (1, 2), and
- the notebook on causal inference this will eventually inform.
0.1 Recap: potential outcomes
Last time we discussed the potential outcome framework, which answers the question: How do we calculate a treatment effect
We used the following assumptions:
Under those assumptions, we have the causal adjustment formula
Aside: what is going on in positivity?
And now…
0.2 Graphical models for causation wrangling
We have a finite collection of random variables
For simplicity of exposition, each of the RVs will be discrete so that we may work with pmfs, and write
More notation. We write
We can solve these questions via a graph formalism. That’s where the DAGs come in.
0.2.1 Directed Acyclic Graphs (DAGs)
A DAG is a graph with directed edges, and no cycles. (you cannot return to the same starting node travelling only forward along the arrows.)
DAGs are defined by a set of vertices and (directed) edges.
We show the directions of edges by writing them as arrows.
For nodes
0.3 Bayesian networks
0.3.1 Local Markov assumption
Given its parents in the DAG, a node is independent of all its non-descendants.
With four variable example, the chain rule of probability tells us that we can factorize any
If
If we further remove edges, removing
we can further simplify the factorization of
0.3.2 Bayesian Network Factorization
Given a probability distribution
0.3.3 Minimality
- Given its parents in the DAG, a node X is independent of all its non-descendants
- Adjacent nodes in the DAG are dependent.
0.4 Causal interpretation
0.4.1 Causal Edges
In a directed graph, every parent is a direct cause of all its children.
0.4.2 Causal Bayesian Networks
Causal Edges + Local Markov
0.5 Conditional independence in Bayesian networks
When we fix some nodes, which independences do we introduce?
0.5.1 Chains
We assert that, conditional on B, A and C are independent:
In slow motion,
0.5.2 Forks
We assert that, conditional on B, A and C are independent:
0.5.3 Immoralities
(Colliders when I grew up.)
We assert that, conditional on B, A and C are not in general independent:
Proof that this never factorizes?
0.5.4 Blocked paths
A path between nodes
- Along the path, there is a chain
or a fork , where is conditioned on . - There is a collider
on the path that is not conditioned on and none of its descendants are conditioned on .
0.5.5 d-separation
Two (sets of) nodes
0.5.6 d-separation in Bayesian networks
We use the notation
Given that
1 d-separation implies Association is Causation
1.1 Recommended reading
- Mohan and Pearl’s 2012 tutorial
- Elwert’s intro (Elwert 2013)
- d-separation without tears is an interactive version of Pearl’s original based on daggity.
- Likewise, the ggdag bias structure vignette shows off the useful explanation diagrams available in
ggdag
and is also a good introduction to selection bias and causal dags themselves