Causality, agency, decisions
Updateless decision theory, Newcomb’s boxes, evidential decision theory, commitment races …
October 23, 2018 — February 10, 2025
Suspiciously similar content
Notes on decision theory and causality where agents make decisions, especially in the context of AI safety. This is something I’m actively trying to understand better at the moment. There is some mysterious causality juju in foundation models and other neural nets. This suggests to me that we should think hard about this as we move into the age of AI.
AFAICT using causality to reason about intelligent systems requires some extensions to vanilla causality, because they can themselves reason about the outcomes they wish to achieve, which makes stuff complicated and occasionally weird.
TBC.
1 Agency
What even is agency? How do we recognise it in natural and artificial systems? What are the implications for control, economics, and technology?
2 Identifying agency
Discovering Agents (Kenton et al. 2023; MacDermott et al. 2024) takes an empirical look at things: when are there agents.
3 Causal attribution and blameworthiness
I should write more about this: a connection to computational morality. Everitt et al. (2022) and Halpern and Kleiman-Weiner (2018) seem to be works in this domain.
4 In multi-agent systems
Connection to game theory and multi-agent systems. (Hammond et al. 2023; Liu et al. 2024)
TBD
5 Causality with feedback
A basic thermostat is an example of the most basic extension of basic causality; see causality under feedback.
6 Causality with decisions
When the causal graph includes agents that adapt their behaviour. I assume causality under feedback can be treated as a special case of a decision, where the decisions are very simple.
Where there are agentic decisions, in which nodes contain “policies” that are themselves learned or adapted, the causal graph formalism needs to be even richer.
Which is to say: I think this is correct! I truly know little about this; I’m cribbing live from James Fox’s part of the Causal Incentives group UAI Tutorial.
7 Causal vs Evidential decision theory
Fancy decision theories for problems arising in strategic conflict and in superintelligence scenarios. Keyword: Newcomb’s paradox.
Apparently, I should read this:
A reflective variant of game theory worries about decision problems with smart predictive agents. Strong AI risk people are excitable in the vicinity of these.
Although their reading list is occasionally IMO undiscerning, you might want to start with MIRI’s intro which at least exists.
Existing methods of counterfactual reasoning turn out to be unsatisfactory both in the short term (in the sense that they systematically achieve poor outcomes on some problems where good outcomes are possible) and in the long term (in the sense that self-modifying agents reasoning using bad counterfactuals would, according to those broken counterfactuals, decide that they should not fix all of their flaws).
I haven’t read any of those though. I would probably start from Wolpert and Benford (2013); David Wolpert always seems to have a good Gordian knot cutter on his multitool.
8 Updateless decision theory
9 Commitment races
Commitment Races are important in international relations. They also seem popular in AI safety theory, although I am not sure why, since I don’t understand how AIs can credibly commit to things; setting up credible signals that they will commit to seems difficult and probably exceptional for very opaque systems.