Empowerment and intrinsic motivation

Do agents learn to want freedom?

2022-11-27 — 2025-09-26

Wherein empowerment is described as an information‑theoretic measure — the mutual information between actions and future states — and its use as an intrinsic exploration bonus in sparse‑reward RL is noted.

adaptive

agents

AI safety

cooperation

culture

economics

evolution

game theory

incentive mechanisms

learning

mind

networks

statmech

utility

wonk

The drive to empowerment is a hypothesized internal pressure on an agent to move itself into states from which it has lots of influence (or many options) going forward. If an agent “seeks empowerment”, it ’wants’ to maximize its ability to affect the future. We want this concept because it helps us ask questions like: is “power-seeking” a generic property of intelligent agents?

As with many such intuitions, it’s harder to usefully formalize.

Technical empowerment is tightly defined: it’s a precise, information-theoretic quantity from optimal control theory, specifically the mutual information between actions and future states.. In that sense, empowerment is measured as the mutual information between the agent’s actions (or action sequences) and its future states. In rough terms: how many distinguishable futures can the agent reliably reach, given its action choices and the dynamics of the environment? For that reason, empowerment is sometimes called a “pseudo-utility” (i.e. a kind of internal reward proxy) that depends only on local, agent‐accessible information. That assumes a lot of structure: a well-defined agent, environment, state space, action space, transition dynamics, etc.

In the broader metaphorical sense, empowerment gestures towards a general tendency of agents to keep options open and extend their influence. The argument (e.g. in Empowerment Is (Almost) All We Need) is that a sufficiently powerful drive for empowerment might lead to many of the behaviours we desire in intelligent agents — exploration, maintaining options, robustness, etc.

In each case, the concept is less about a fixed external goal or reward (like “get the treasure”) — rather, the agent is driven by how much control it has over its future, regardless of the extrinsic objective.

1 Technical empowerment in reinforcement learning

In RL, agents typically optimize an external reward signal. Many environments have sparse, noisy, or delayed reward signals, which makes learning hard (the exploration problem, credit assignment, etc.).

Empowerment offers a complementary mechanism:

Because it doesn’t depend on an external task, an agent can explore more “safely” or systematically by trying to increase its control.
It biases the agent towards states with many outgoing branches — places from which many futures are reachable. In many domains, that corresponds to being in central, flexible positions rather than being stuck in a dead end.
Some works combine extrinsic reward with empowerment. For example, there is a formulation of a unified Bellman equation that mixes reward maximization with empowerment terms. (Leibfried, Pascual-Diaz, and Grau-Moya 2020)
Others use empowerment or information‐theoretic objectives as intrinsic motivations to guide exploration, especially in sparse‐reward tasks. (Dai et al. 2021)
More recently, works have integrated causal modelling with empowerment to get better sample efficiency and more directed exploration. For instance, “Empowerment via Causal Learning” is a framework in model‐based RL that uses causal structure to compute empowerment more meaningfully. (Cao, Feng, Fang, et al. 2025)
There is also a notion called causal action empowerment, which aims to focus the empowerment signal on those actions that causally influence important parts of the environment. (Cao, Feng, Huo, et al. 2025) I really need to read that one.

Challenges / caveats:

For the technical definition:

Computing empowerment is often expensive, especially in high-dimensional, continuous, or partially observable settings, because estimating information is hard and because getting information about future states requires modelling the environment dynamics. So it’s the usual RL problems, but with a spiky, tricky estimand. There are various approximation methods. (Zhao et al. 2020)

2 Intrinsic motivation

Empowerment is one member of a broader family of ideas sometimes called intrinsic motivation models. Instead of relying on a sparse external reward signal, an intrinsically motivated agent manufactures its own incentives to act. These are heuristics for “what’s worth doing” when the task is unclear, the reward is delayed, or there may not be any reward at all.

There are several flavours I’ve seen:

Empowerment: As see above. In the technical sense, maximise action–future mutual information (A. S. Klyubin, Polani, and Nehaniv 2005). In the metaphorical sense, keep your world malleable and avoid dead ends.
Curiosity / novelty: seek out states that reduce uncertainty or maximise prediction error (Schmidhuber 2010; Du et al. 2023). This is the “learn what you don’t know yet” drive.
Play: generate behaviours with no immediate external payoff, but which enrich the agent’s behavioural repertoire and skill base. In humans, play scaffolds learning. In agents, it can be a way of stumbling into competence.
Quality–diversity / novelty search: abandon extrinsic benchmarks altogether and reward the discovery of new behaviours, regardless of “performance” (Lehman and Stanley 2011).

All of these function as internal reward surrogates. They are not tied to the final task, but they shape the learning trajectory so that when tasks arrive, the agent is already robust, exploratory, and resourceful.

This makes intrinsic motivation a step between the two paradigms I’ve been circling: optimizers with a fixed loss, and replicators with no fixed loss but an imperative to persist. Intrinsic drives are “messy” in the same way life is messy: they don’t guarantee that the agent is always doing the right thing, are a noisy proxy for “good things”.

3 Empowerment in replicators

We might draw a suggestive connection to evolutionary biology, based on the robustness and evolvability idea (Wagner 2005) from evolutionary theory. Is that analogous to empowerment?

3.1 Replicators, vehicles, and influence

In evolutionary biology, a replicator (in the Dawkins/Hull sense) is an entity that is replicated with variation and subject to selection pressures (Godfrey-Smith 2000). A gene “wants” to maximize its propagation potential (informally speaking). It evolves strategies (via the organism) to influence its environment (via phenotype, behaviour, niche construction, etc.). Of course, genes don’t literally compute empowerment in the technical sense; we’re back to the metaphorical now.

A replicator benefits from having many possible viable futures — i.e. flexibility in ecological or developmental trajectories such that it can survive under various conditions.
The vehicle/organism is the means by which the replicator acts on the environment to preserve or replicate itself.

Just as an AI agent might try to keep many future branches open, a replicator — through its phenotypic machinery — might favour designs that maintain options in changing environments. Arguments suggesting replicators might seek such goals are awkwardly called selection theorems.

3.2 Empowerment-like structure in evolution

Here are some speculative bridges to empowerment:

Robustness & evolvability: replication systems that can tolerate perturbations (robustness) and adapt (evolvability) are more “powerful” in the face of environmental change. That’s a kind of biological counterpart to having many controllable futures.
Niche construction / environment modification: many organisms modify the environment (e.g. beaver dams, root systems altering soil, microbial communities altering chemistry). These are ways replicators/vehicles shape the environment to increase the control space or favourable pathways they can exploit. That’s like increasing empowerment by altering environmental dynamics.
Redundancy, backup paths, modularity: mechanisms like gene duplication, redundant pathways, or modular designs allow alternative routes of adaptation or survival when parts fail. That’s akin to an agent having multiple “paths” to future states.
Selection in fluctuating environments: when environments change, replicators that maintain flexible strategies (i.e. not overly specialized) may outperform those with narrow optima. That aligns with the push toward “keeping many options open.”

4 AI Agents and power-seeking

Empowerment suggests a route toward open-endedness: systems whose internal drive is to expand their “influence frontier” could drive themselves toward complexity and diversity (an ambition in AGI / artificial life).

In the AI safety literature, this broader tendency often shows up under the heading of power-seeking. The concern is that almost regardless of their outer goals, capable agents will instrumentally pursue strategies that give them more control over their environment, preserve their own functioning, and prevent others from shutting them down. These are natural consequences of empowerment-like drives, whether or not they are ever explicitly coded in.

This connects us to the idea of instrumental convergence (Omohundro 2018): many different goals, once pursued by sufficiently capable agents, lead to the same kinds of instrumental strategies — acquiring resources, defending against threats, preserving optionality, and extending influence. In other words, empowerment (in the metaphorical sense) is almost a restatement of why instrumental convergence is expected: keeping more options open and maintaining control over the environment are generally useful for pursuing almost any long-term objective.

As with the evolutionary version, we face the problem that this is now an open-ended world, so how do we formalize and measure empowerment in such a setting? Technical empowerment gives us some footholds, but once we are in multi-agent, evolving, or unbounded environments, it becomes much harder to define what “future influence” even means.

Classic reading on this theme:

jacob\_cannell: Empowerment is (almost) All We Need on A. S. Klyubin, Polani, and Nehaniv (2005)
Joe Carlsmith, When should we worry about AI power-seeking?

5 Incoming

Things I’d like to read, suggested

Salge, Glackin, and Polani (2014)
Combine with extrinsic tasks
- See how empowerment helps in sparse reward environments or as an exploration bonus.
- Read “A Unified Bellman Principle Combining Reward Maximization and Empowerment” for one approach. (Leibfried, Pascual-Diaz, and Grau-Moya 2020)
Scaling & approximation
- Formally, look into techniques for approximating empowerment in high-dimensional or continuous spaces (variational approximations, estimating mutual information, etc.). (Zhao et al. 2020)
Causal / structure-based enhancements
- In the technical sense, explore recent work that adds causal modelling to compute empowerment more meaningfully (i.e. what actions truly affect what variables). (Cao, Feng, Fang, et al. 2025)
Connections to open-endedness, evolution, artificial life
- Metaphorically, study how intrinsic drives (like empowerment) can support open-ended growth or autonomous innovation.
- Read in artificial life / evolutionary robotics about self-replication, niche construction, and the pressures toward controllability and adaptability. (Taylor and Dorin 2020)
Critiques and safety considerations
- In the agent foundations sense, investigate failure modes: e.g. empowerment-driven agents might prefer “safe control” regions over risky but useful ones.
- Analyze whether empowerment aligns with human values or task goals, and whether it can be perverted.

6 References

Berrueta, Pinosky, and Murphey. 2024. “Maximum Diffusion Reinforcement Learning.” Nature Machine Intelligence.

Cao, Feng, Fang, et al. 2025. “Towards Empowerment Gain Through Causal Structure Learning in Model-Based RL.”

Cao, Feng, Huo, et al. 2025. “Causal Action Empowerment for Efficient Reinforcement Learning in Embodied Agents.” Science China Information Sciences.

Dai, Xu, Hofmann, et al. 2021. “An Empowerment-Based Solution to Robotic Manipulation Tasks with Sparse Rewards.”

Du, Kosoy, Dayan, et al. 2023. “What Can AI Learn from Human Exploration? Intrinsically-Motivated Humans and Agents in Open-World Exploration.” In.

Godfrey-Smith. 2000. “The Replicator in Retrospect.” Biology and Philosophy.

Hafner, Ortega, Ba, et al. 2022. “Action and Perception as Divergence Minimization.”

Klyubin, Alexander S., Polani, and Nehaniv. 2005. “All Else Being Equal Be Empowered.” In Advances in Artificial Life.

Klyubin, A.S., Polani, and Nehaniv. 2005. “Empowerment: A Universal Agent-Centric Measure of Control.” In 2005 IEEE Congress on Evolutionary Computation.

Lehman. 2007. “Evolution Through the Search for Novelty.”

Lehman, Gordon, Jain, et al. 2022. “Evolution Through Large Models.”

Lehman, and Stanley. 2011. “Abandoning Objectives: Evolution Through the Search for Novelty Alone.” Evolutionary Computation.

———. 2013. “Evolvability Is Inevitable: Increasing Evolvability Without the Pressure to Adapt.” PLoS ONE.

Leibfried, Pascual-Diaz, and Grau-Moya. 2020. “A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment.” In.

Omohundro. 2018. “The Basic AI Drives.” In Artificial Intelligence Safety and Security.

Ringstrom. 2022. “Reward Is Not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning.”

Salge, Glackin, and Polani. 2014. “Empowerment–An Introduction.” In Guided Self-Organization: Inception.

Schmidhuber. 2010. “Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010).” IEEE Transactions on Autonomous Mental Development.

Tarsney. 2025. “Will Artificial Agents Pursue Power by Default?”

Taylor, and Dorin. 2020. Rise of the Self-Replicators: Early Visions of Machines, AI and Robots That Can Reproduce and Evolve.

Turner, Smith, Shah, et al. 2021. “Optimal Policies Tend To Seek Power.” In Advances in Neural Information Processing Systems.

Wagner. 2005. Robustness and Evolvability in Living Systems. Princeton Studies in Complexity.

———. 2008. “Robustness and Evolvability: A Paradox Resolved.” Proc Biol Sci.

Zhao, Lu, Abbeel, et al. 2020. “Efficient Empowerment Estimation for Unsupervised Stabilization.”