Decision theory in mechanized causal graphs

Newcomb’s box via the back door, the intentional stance for arrow diagrams

2026-05-05 — 2026-05-06

Wherein Mechanism Nodes Are Introduced as First-Class Elements of Causal Graphs, and Newcomb’s Problem Is Examined Such That a Predictor Is Seen to Read an Agent’s Policy Rather Than Its Act.

adaptive

agents

causality

cooperation

extended self

game theory

graphical models

incentive mechanisms

learning

mind

networks

utility

Causality, agency, decisions, learning is the motivation for this notebook. There we hand-waved through mechanized causal graphs.

Here I intend to go deep into the formalism introduced in Everitt et al. (2021) and Everitt et al. (2022). Have not done so yet — this is a WIP.

The point of the notation is that mechanism nodes are first-class, so we can talk about edges between mechanisms — e.g. an agent reasoning about how its objective depends on the world’s mechanism, or a predictor reading the agent’s policy (MacDermott, Everitt, and Belardinelli 2023).

By the end of the notebook we should be able to use MacDermott, Everitt, and Belardinelli (2023) to illuminate cosmic decision theories.

In much the same way that I think standard DAGs have a role in uncovering some hidden assumptions in ML, I think there are some hidden assumptions in cosmic decision theories that mechanized causal graphs will help us uncover.

But let us see.

Later, we may even be able to use Kenton et al. (2023) or MacDermott et al. (2024) to discover intentionality in the world.

1 Mechanizing causal graphs

A mechanized causal graph augments a causal DAG with a mechanism variable (also called a policy) for some object-level nodes — encoding how that node’s value is generated from its parents by some kind of learned (“agenty”) mechanism (Kenton et al. 2023).

Layered on top is the influence-diagram convention from (Everitt et al. 2021): rectangles for decisions, diamonds for utilities, ovals/circles for everything else.

What are we doing when we claim that some nodes are special “mechanism” nodes? AFAICT formalising the intentional stance.

LaTeX tutorial (actually a TiKz tutorial.)

Symbol	Style	Meaning
◯	`obj`	object-level chance variable
●	`mech`	mechanism variable
▭	`dec`	decision
◇	`util`	utility
`→` solid	`causal`	causal edge between object-level variables, or functional edge from mechanism to its variable
`→` dashed	`mech-edge`	non-terminal mechanism dependency
`→` dashdotted	`objective`	terminal mechanism edge (an agent’s objective)

2 Object-level DAG

The starting point — a vanilla causal DAG with no mechanism layer.

Each object variable acquires a mechanism node above it. The dashed edge from \(M_X\) to \(X\) is functional, not causal: it says “\(X\)’s value is computed by the mechanism \(M_X\) from its parents.” The solid edges between object-level nodes are the original causal structure.

3 Cosmic decision theories

The mechanized view earns its keep on problems where the agent’s policy is itself an input to some other variable. In Newcomb’s problem the predictor \(P\) forecasts the agent’s decision \(D\) — but \(P\) is causally upstream of \(D\), so the dependence cannot be captured by an object-level edge from \(D\) to \(P\). The mechanized graph routes it through \(M_D\), the agent’s decision mechanism (its policy): \(P\) depends on \(M_D\), and \(M_D\) functionally determines \(D\). The dashdotted edge is the literature’s objective/terminal-mechanism edge style — used loosely here to flag the “back door” through which the predictor accesses the agent.

The intended reading: prediction \(P\) determines box contents \(V\); the agent’s mechanism \(M_D\) implements a policy that produces a decision \(D\); utility \(U\) depends on both the contents and the choice; and the “Newcomb-ness” lives in the dash-dotted edge \(M_D \to P\) — the predictor sees the policy, not the act.

4 See also

Causality, agency, decisions, learning — the narrative version of all this.
Causality with feedback — thermostats and friends.
Newcomb-style decision problems — what to do when the world reads your policy.

5 References

Dawid. 2002. “Influence Diagrams for Causal Modelling and Inference.” International Statistical Review.

Everitt, Carey, Langlois, et al. 2021. “Agent Incentives: A Causal Perspective.” In Proceedings of the AAAI Conference on Artificial Intelligence.

Everitt, Ortega, Barnes, et al. 2022. “Understanding Agent Incentives Using Causal Influence Diagrams. Part I: Single Action Settings.”

Fox, Everitt, Carey, et al. 2021. “PyCID: A Python Library for Causal Influence Diagrams.” In.

Fox, MacDermott, Hammond, et al. 2023. “On Imperfect Recall in Multi-Agent Influence Diagrams.” Electronic Proceedings in Theoretical Computer Science.

Geiger, Ibeling, Zur, et al. 2024. “Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability.”

Kenton, Kumar, Farquhar, et al. 2023. “Discovering Agents.” Artificial Intelligence.

Koller, and Milch. 2003. “Multi-Agent Influence Diagrams for Representing and Solving Games.” Games and Economic Behavior, First World Congress of the Game Theory Society,.

MacDermott, Everitt, and Belardinelli. 2023. “Characterising Decision Theories with Mechanised Causal Graphs.”

MacDermott, Fox, Belardinelli, et al. 2024. “Measuring Goal-Directedness.”

Ward, MacDermott, Belardinelli, et al. 2024. “The Reasons That Agents Act: Intention and Instrumental Goals.”