Power-seeking

Instrumental convergence and the drive to keep control

2022-11-27 — 2026-04-09

Wherein the AI-Safety Worry That Capable Agents Will Instrumentally Seek Control Is Set Beside the Empowerment Formalism, and Found Harder to Pin Down.

agents
AI safety
cooperation
economics
game theory
incentive mechanisms
mind
utility
wonk
Figure 1

Informational empowerment gives us a tidy formalism for an agent’s drive to keep its options open: the mutual information between its actions and its future states. Here I want the messier sibling question — when does a capable agent seek power over the world, and should that worry us?

In the AI safety literature this tendency is called power-seeking. The concern is that almost regardless of their outer goals, capable agents will instrumentally pursue strategies that give them more control over their environment, preserve their own functioning, and prevent others from shutting them down, because for multi task agents, having more influence over the world is generally useful for achieving a wide range of goals. The term of art for this is instrumental convergence (Omohundro 2018): many different goals, once pursued by sufficiently capable agents, lead to the same kinds of instrumental strategies — acquiring resources, defending against threats, preserving optionality, and extending influence.

1 Does empowerment entail power-seeking?

The tempting move is to read power-seeking straight off the empowerment formalism: an agent that maximizes its influence over future states is an agent that hoards control, so the safety worry is just empowerment with the sign flipped. There are even formal footholds for this: optimal policies provably tend to seek power under broad conditions (Turner et al. 2021), and one can ask whether artificial agents pursue power by default rather than only when explicitly rewarded for it (Tarsney 2025).

I am not sure the identification is clean, though. As with the evolutionary version of empowerment, we face the problem that this all happens in an open-ended world. How do we formalize and measure empowerment in such a setting? Informational empowerment gives us some footholds, but once we are in multi-agent, evolving, or unbounded environments, it becomes much harder to define what “future influence” even means, let alone hope it cashes out in a nice equation that we can compute.

Classic reading on this theme:

2 References

Klyubin, Polani, and Nehaniv. 2005. Empowerment: A Universal Agent-Centric Measure of Control.” In 2005 IEEE Congress on Evolutionary Computation.
Omohundro. 2018. The Basic AI Drives.” In Artificial Intelligence Safety and Security.
Tarsney. 2025. Will Artificial Agents Pursue Power by Default?
Turner, Smith, Shah, et al. 2021. Optimal Policies Tend To Seek Power.” In Advances in Neural Information Processing Systems.