Notes on decision theory and causality where agents make decisions, in the context of iterated games in multi-agent systems, with applications to AI safety.
Extending causal DAGs to include agents and decisions.
Multi-agent graphs
There seems to be a long series of works attempting this (Heckerman and Shachter 1994; Dawid 2002; Koller and Milch 2003). I am working from Hammond et al. (2023) and MacDermott, Everitt, and Belardinelli (2023), which introduce the One Ring that unifies them all in the form of something called a Mechanised Multi-Agent Influence Diagram, a.k.a. a MMAID.
cf Liu et al. (2024).
Commitment races
See commitment for a discussion of the commitment problem in the context of multi-agent systems.
References
Fox, MacDermott, Hammond, et al. 2023.
“On Imperfect Recall in Multi-Agent Influence Diagrams.” Electronic Proceedings in Theoretical Computer Science.
Hammond, Chan, Clifton, et al. 2025.
“Multi-Agent Risks from Advanced AI.”
Hammond, Fox, Everitt, et al. 2023.
“Reasoning about Causality in Games.” Artificial Intelligence.
Harley. 1981.
“Learning the Evolutionarily Stable Strategy.” Journal of Theoretical Biology.
Heckerman, and Shachter. 1994.
“A Decision-Based View of Causality.” In
Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence. UAI’94.
Koller, and Milch. 2003.
“Multi-Agent Influence Diagrams for Representing and Solving Games.” Games and Economic Behavior, First World Congress of the Game Theory Society,.
MacDermott, Fox, Belardinelli, et al. 2024.
“Measuring Goal-Directedness.”
Sanders, Galla, and Shapiro. 2011.
“Effects of Noise on Convergent Game Learning Dynamics.” arXiv:1109.4853.
Wolpert, and Benford. 2013.
“The Lesson of Newcomb’s Paradox.” Synthese.