Power-seeking

Instrumental convergence and the drive to keep control

2022-11-27 — 2026-04-09

Wherein Instrumental Convergence Is Examined as a Driver of Resource Acquisition and Self-Preservation Across Diverse Agent Goals, and Its Relationship to Empowerment Formalisms Is Considered.

agents

AI safety

cooperation

economics

game theory

incentive mechanisms

mind

utility

wonk

Informational empowerment gives us a tidy formalism for an agent’s drive to keep its options open: the mutual information between its actions and its future states. Here I want to explore the messier sibling question — when does a capable agent seek power over the world, and should that worry us?

In the AI safety literature this tendency is called power-seeking. The concern is that almost regardless of their outer goals, capable agents will instrumentally pursue strategies that give them more control over their environment, preserve their own functioning, and prevent others from shutting them down, because for multi-task agents, having more influence over the world is generally useful for achieving a wide range of goals. The term of art for this is instrumental convergence (Omohundro2008Basic?): many different goals, once pursued by sufficiently capable agents, lead to the same kinds of instrumental strategies — acquiring resources, defending against threats, preserving optionality, and extending influence.

1 Does empowerment entail power-seeking?

The tempting move is to read power-seeking straight off the empowerment formalism: an agent that maximises its influence over future states is an agent that hoards control, so the safety worry is just empowerment with the sign flipped. There are even formal footholds for this: optimal policies provably tend to seek power under broad conditions (Turner et al. 2021), and one can ask whether artificial agents pursue power by default rather than only when explicitly rewarded for it (Tarsney 2025).

I am not sure the identification is clean, though. As with the evolutionary version of empowerment, we face the problem that all of this happens in an open-ended world. How do we formalise and measure empowerment in such a setting? Informational empowerment gives us some footholds, but once we are in multi-agent, evolving, or unbounded environments, it becomes much harder to define what “future influence” even means, let alone hope it cashes out in a neat equation that we can compute.

Classic reading on this theme:

jacob\_cannell: Empowerment is (almost) All We Need on Klyubin, Polani, and Nehaniv (2005)
Joe Carlsmith, When should we worry about AI power-seeking?

2 References

Klyubin, Polani, and Nehaniv. 2005. “Empowerment: A Universal Agent-Centric Measure of Control.” In 2005 IEEE Congress on Evolutionary Computation.

Tarsney. 2025. “Will Artificial Agents Pursue Power by Default?”

Turner, Smith, Shah, et al. 2021. “Optimal Policies Tend To Seek Power.” In Advances in Neural Information Processing Systems.