Causality, agency, decisions
Exotic decision theories, Newcomb’s boxes…
2018-10-23 — 2026-01-13
Wherein mechanized causal graphs are presented, and influence diagrams with probabilistic decision nodes are described; thermostat feedback and multi‑agent interactions are examined.
Notes on decision theory and causality in which agents make decisions, especially in the context of AI safety.
For bonus points, we might consider multiple agents.
This is something I’m actively trying to understand better.
There is some mysterious causality juju in foundation models and other neural nets. This suggests to me that we should think hard about it as we move into the age of AI.
As far as I can tell, reasoning about intelligent systems with causality requires some extensions to vanilla causality, because intelligent systems can reason about the outcomes they wish to achieve, which makes things complicated and occasionally weird.
TBC.
1 Causality with feedback
A thermostat is a basic example of causality under feedback; see causality under feedback.
2 Basic Mechanization
A recent, useful introduction is Everitt et al. (2021), so let’s follow it.
set of random variables \(\boldsymbol{V}\) with joint distribution \(\operatorname{Pr}(\boldsymbol{V})\) is a directed acyclic graph (DAG) \(\mathcal{G}=(\boldsymbol{V}, \mathcal{E})\) with vertices \(\boldsymbol{V}\) and edges \(\mathcal{E}\) such that the joint distribution can be factorised as \(\operatorname{Pr}(\boldsymbol{V})=\prod_{V \in \boldsymbol{v}} \operatorname{Pr}\left(V \mid \boldsymbol{Pa}_V\right)\), where \(\boldsymbol{Pa}_{\boldsymbol{V}}\) are the parents of \(V\) in \(G\).
This should be familiar from causal DAGs.
There’s an extension to classic Bayesian networks called influence diagrams. They are a generalization of Bayesian networks that can represent decision problems, using “square nodes for decision variables, diamond nodes for utility variables, and round nodes for everything else.” In contrast to classic influence diagrams, there are probability distributions over decision variables. [TODO clarify]
TBC
3 Mechanized Multi-agent DAGs
We can extend causal DAGs to include many agents deciding about each other. See causality with multiple agents.
4 Identifying agency
What even is agency? How do we recognize it in natural and artificial systems? What are the implications for control, economics, and technology?
Discovering Agents (Kenton et al. 2023; MacDermott et al. 2024) takes an empirical look at the question of agency by examining, AFAICT, what counts as a deciding node in a mechanized causal graph.
5 Causal attribution and blameworthiness
I should write more about this — a connection to computational morality. Everitt et al. (2022) and Joseph Y. Halpern and Kleiman-Weiner (2018) are relevant works in this domain.
6 Causal vs Evidential decision theory
I no longer think this binary is a good way of understanding the Newcomb problem because:
- The mechanised causal graphs look like a crisper definition of the concepts here.
- The analyses that start from flavours of decision theory, rather than the causal axiomatization, seem unusually spammy and full of vagueness.
This is kept for historical reasons.
These are fancy decision theories for problems arising in strategic conflict and in superintelligence scenarios. Keyword: Newcomb’s paradox. A reflective twist on game theory looks at decision problems involving smart, predictive agents. Those of us who worry about strong AI risk tend to get excitable about these problems.
I have had the following resources recommended to me:
Although their reading list is occasionally, IMO, undiscerning, we might want to start with MIRI’s intro, which at least exists.
Existing methods of counterfactual reasoning turn out to be unsatisfactory both in the short term (in the sense that they systematically achieve poor outcomes on some problems where good outcomes are possible) and in the long term (in the sense that self-modifying agents reasoning using bad counterfactuals would, according to those broken counterfactuals, decide that they should not fix all of their flaws).
I haven’t read any of those, though. I’d probably start with Wolpert and Benford (2013); David Wolpert always seems to have a good Gordian knot cutter on his analytical multitool.
7 Tooling
Fox and coauthors wrote a library for computing with various interesting causal influence diagrams, causalincentives/pycid (Fox et al. 2021):
Library for graphical models of decision making, based on pgmpy and networkx
