The causal hierarchy and its discontents

2026-04-09 — 2026-04-10

Wherein the apparent irreducibility of causal claims to subjective probability is shown to dissolve once the variable set is enriched with naturalized intervention variables.

algebra

causality

graphical models

how do science

machine learning

networks

probability

statistics

A particularly cosmic part of the study of causal inference is Pearl’s Causal Hierarchy (PCH), also known as the “ladder of causation.” It organises causal reasoning into three layers of increasing expressiveness, each strictly more powerful than the one below it. The standard reference is Bareinboim et al. (2022); Pearl himself gives a more accessible treatment in The Book of Why.

1 The three rungs

Rung 1 — Association (\(P(Y \mid X)\)). Observational statistics: seeing and predicting. “What is the probability of Y given that I observe X?” This is the domain of standard statistical and machine learning models — correlations, regressions, classification. No notion of intervention lives here.

Rung 2 — Intervention (\(P(Y \mid \operatorname{do}(X))\)). What happens if I do something? “If I set \(X = x\) (by force, ignoring whatever normally determines \(X\)), what happens to \(Y\)?” This is the realm of do-calculus, randomised controlled trials, and policy evaluation. The \(\operatorname{do}(\cdot)\) operator breaks incoming edges to \(X\) in the causal DAG — graph surgery — so the resulting distribution is in general not recoverable from Rung 1 alone.

Rung 3 — Counterfactual (\(P(Y_x \mid X = x’, Y = y’)\)). Retrospective imagination: “Given that I observed \(X = x’\) and \(Y = y’\), what would \(Y\) have been had \(X\) been \(x\) instead?” This requires a full structural causal model (SCM) with its exogenous noise terms, not merely the causal graph. It is the province of actual causation, attribution, fairness, and regret (Halpern 2016).

The hierarchy is strict in the sense formalised by Bareinboim et al. (2022): in general, no amount of Rung 1 data suffices to answer Rung 2 questions, and no amount of Rung 2 data suffices to answer Rung 3 questions, without additional structural assumptions. Yang and Bareinboim (2025) extend this to a finer-grained graphical hierarchy within the counterfactual layer itself.

2 Can causation be reduced to probability?

My question preciesely! And the reason this page exists.

De Finetti conjectured that causation, like chance, reduces to patterns in subjective probability. The strictness results for the PCH seem to slam the door on this: causal claims are provably irreducible to probabilistic ones within SCMs. But Herrmann et al. (2026) argue the reduction succeeds after all:

The apparent irreducibility arises from an implicit restriction on the agent’s representational resources: once the variable set is enriched to include naturalized intervention variables, interventional propositions reduce to ordinary probabilistic conditioning. [… We] show how this framework bears on the debate between evidential and causal decision theory, arguing that causal decision theory functions as a useful approximation of evidential reasoning over the full algebra. Within SCMs, causal reasoning reduces to subjective probability.

Their trick is to expand the ontology of events in the probability space. If an agent’s probability space already contains variables that represent whether and how interventions occur (what Herrmann et al. call naturalized intervention variables), then conditioning on those variables recovers what \(\operatorname{do}(\cdot)\) gives us — no extra-probabilistic operator needed. The hierarchy collapses after all. It’s not that the theorems are wrong, but because their strictness presupposes a variable set that excludes the mechanism by which interventions are carried out. A richer variable set allows us to express interventional claims as ordinary probabilistic ones.

This has consequences for the old fight between evidential and causal decision theory: if we have a rich enough model of the world (one that includes our own decision mechanism as a variable), EDT and CDT “converge”. CDT is then a useful shortcut — a way to approximate evidential reasoning when we don’t want to model the full causal structure of our own agency.

I need to expand on this properly; for now the sketch above is a placeholder. Topics to flesh out include the relationship to transportability (Pearl and Bareinboim 2014), the connection to the agency-as-mechanism view, and worked examples of the naturalization play.

3 References

Bareinboim, Correa, Ibeling, et al. 2022. “On Pearl’s Hierarchy and the Foundations of Causal Inference.” In Probabilistic and Causal Inference: The Works of Judea Pearl.

Halpern. 2016. Actual causality.

Herrmann, Mohseni, Levinstein, et al. 2026. “A Bayesian Reduction of Causation.”

Pearl, and Bareinboim. 2014. “External Validity: From Do-Calculus to Transportability Across Populations.” Statistical Science.

Yang, and Bareinboim. 2025. “A Hierarchy of Graphical Models for Counterfactual Inferences.” In.