Intrinsic motivation

Do agents learn to want freedom? Do they learn to want to learn? Do they learn to goof off? etc

2022-11-27 — 2026-06-25

Wherein Several Formalisations of Self-Generated Reward Are Surveyed, Including Compression Progress, Empowerment, and Novelty Search, Each Resolving to a Choice of Mutual Information Term to Optimise.

adaptive

agents

AI safety

cooperation

culture

economics

evolution

game theory

incentive mechanisms

learning

mind

networks

statmech

utility

wonk

Intrinsic motivations in machine learning are ones other than the default give the agent a reward function which encodes what we want process. Instead of relying on a sparse external reward signal, an intrinsically motivated agent manufactures its own incentives to act. These are heuristics for “what’s worth doing” when the task is unclear, the reward is delayed, or there may not be any reward at all.

Why care? Several reasons. Maybe we want to devise an open-ended learning algorithm that doesn’t stall the moment it runs out of curriculum. Maybe we want to understand what “interesting” means formally — why do babies put things in their mouths, why do scientists run experiments, and can we get a robot to do something similar? Maybe we are worried about AI safety, and want to know what an agent will do when left to its own devices — because the answer turns out to involve power-seeking and convergent instrumental goals in ways that should concern us. Or maybe we’ve noticed that the explore/exploit trade-off in adaptive experiment design is suspiciously similar to curiosity, and we want to know if the same maths is hiding underneath.

It turns out there are several formalisations of intrinsic motivation in the literature, and most of them boil down to choosing which information-theoretic quantity to optimise:

Empowerment: In the technical sense, maximise action–future mutual information. In the metaphorical sense, keep your world malleable and avoid dead ends (Lidayan et al. 2025).
Curiosity / novelty: seek out states that reduce uncertainty or maximise prediction error (Schmidhuber 2010; Du et al. 2023; Ramírez-Ruiz et al. 2024). This is the “learn what you don’t know yet” drive.
Quality–diversity / novelty search: abandon extrinsic benchmarks altogether and reward the discovery of new behaviours, regardless of “performance” (Lehman and Stanley 2011). (Hmm, how is this “novelty search” different from the previous “curiosity / novelty”?)
Play: generate behaviours with no immediate external payoff, but which enrich the agent’s behavioural repertoire and skill base. In humans, play scaffolds learning. In agents, it can be a way of stumbling into competence.
Interactivity: maximise the algorithmic information of future behaviour conditioned on past experience (Lewandowski et al. 2025).
Just stay alive: This is part of what evolution seems to do somehow.

All of these function as internal reward surrogates. They are not tied to a final task, but they shape the learning trajectory so that when tasks arrive, the agent is already robust, exploratory, and resourceful.

This makes intrinsic motivation a step between the two paradigms: optimizers with a fixed loss, and replicators with no fixed loss but an imperative to persist. Intrinsic drives seem “messy” in the same way life is messy: they don’t guarantee that the agent is always doing the right thing, and are a noisy proxy for “good things”.

What follows is my attempt to sketch the major formalisations. I don’t claim to be exhaustive here; I am just trying to learn enough of the maths to orient myself, and I intend to satisfice, stopping when I find a sufficiently good formalism that I can understand and apply.

1 Curiosity as compression progress

The OG formalisation is Schmidhuber’s theory of creativity and fun (Schmidhuber 2010), which has been evolving since 1990. The core idea: our agent maintains an adaptive world model (a predictor or compressor \(p\)) and gets intrinsic reward proportional to the improvement in that model’s performance.

Let \(C(p, h(\leq t))\) be some quality measure of predictor \(p\) evaluated on history \(h\) up to time \(t\). The “intrinsic” reward at time \(t+1\) is

\[r_{\text{int}}(t+1) = f\bigl[C(p(t), h(\leq t+1)),\; C(p(t+1), h(\leq t+1))\bigr]\]

where \(f(a,b) = a - b\) is the simplest choice: how much better did the model get? The RL controller then maximises expected future intrinsic reward, i.e. expected future learning progress.

This neatly sidesteps the “white noise” trap that kills naïve curiosity-as-surprise. An agent rewarded by raw prediction error will get stuck staring at a TV tuned to static — high surprise, zero learnability. But compression progress on white noise is zero, because the model can’t improve. So the agent is motivated to seek out data that is currently surprising and learnable — the sweet spot between boredom and confusion. AFAICT this is the phenomenon that the whole intrinsic motivation literature is trying to capture.

Schmidhuber also argues this mechanism explains aspects of humour, art, and scientific curiosity: the punch line of a joke is a moment of rapid compression progress, and a beautiful proof is one that suddenly makes a large body of facts more compressible. Whether we buy that as a full theory of aesthetics is another matter, but as a generative principle for exploration it is hard to beat.

There are several practical variants. In the earliest (1990) version, intrinsic reward is proportional to the prediction error of an RNN world model. A 1991 refinement rewards not the error itself but its first derivative — the change in prediction reliability, measured by a separate “confidence network.” A 1995 version uses the KL divergence between the predictor’s prior and posterior as the curiosity signal:

\[r_{\text{int}}(t) \propto D_{\text{KL}}\!\bigl[p(\cdot \mid h(\leq t)) \;\|\; p(\cdot \mid h(< t))\bigr]\]

which is just information gain — another measure of learning progress, and the connection to Huffman coding and saved bits is immediate.

Schmidhuber’s theory is a very Schmidhuber theory, which is to say, he did in fact come up with it first, but his early attempt is a haphazard mess that it took subsequent researchers a while to either distil or rediscover.

2 Predictive information

This one comes from physics rather than AI. Important names: Bialek, Nemenman, Tishby. Bialek, Nemenman, and Tishby (2001b) define the predictive information \(I_{\text{pred}}(T)\) — the mutual information between the past (a window of duration \(T\)) and the entire future of a time series:

\[I_{\text{pred}}(T) = I(x_{\text{past}};\, x_{\text{future}}) = S(T) + S(T’) - S(T + T’)\]

where \(S(T)\) is the entropy of observations over a window of length \(T\), and we take \(T’ \to \infty\).

The trick is that entropy is extensive (\(S(T) \approx S_0 T\) for large \(T\)), so the predictive information is entirely determined by the “subextensive” corrections \(S_1(T) = S(T) - S_0 T\). Predictability is a deviation from extensivity. Most of the information we collect over time is irrelevant to prediction — a law of diminishing returns for observation.

The growth rate of \(I_{\text{pred}}(T)\) classifies the complexity of the underlying process. For a process with a finite number of learnable parameters, \(I_{\text{pred}}(T) \sim \frac{d}{2}\log T\) where \(d\) is the effective model dimension. For nonparametric processes (continuous functions with smoothness constraints), we get power-law growth \(I_{\text{pred}}(T) \sim T^\alpha\) with \(0 < \alpha < 1\). These are different “universality classes” of learnability, in a stat-mech, or comp mech sense of the word.

Now! What does this buy us for intrinsic motivation? Well, if an agent wants to seek out interesting environments, it might choose those where \(I_{\text{pred}}\) is neither zero (boring, fully predictable) nor maximal (random noise), but growing at an intermediate rate — environments rich enough to keep learning from, but structured enough that learning actually works. This is an information-theoretic analogue, I suppose, of Schmidhuber’s “sweet spot”.

3 Information-theoretic curiosity in RL

Still and Precup (2012) brings predictive information directly into the RL objective.

Standard MDP setup: agent observes states \(x_t \in \mathbf{X}\), takes actions \(a_t \in \mathbf{A}\), accumulates discounted reward. A policy \(\pi(a|x)\) has an associated action-value function \(Q^\pi(x,a)\) and an expected return \(V^\pi\). So far, totally normal.

The idiosyncratic move is to add a complexity penalty on the policy itself. They view the policy as a lossy compression of the state into actions via rate-distortion theory, and penalise the mutual information \(I^\pi(A, X)\) between actions and states:

\[\min_\pi\; I^\pi(A, X) \quad \text{subject to}\quad V^\pi = \text{const.}\]

Among all policies achieving the same expected return, prefer the simplest one — the one that uses the least information about the state to choose its actions. The solution falls out as a Boltzmann-style policy:

\[\pi_{\text{opt}}(a \mid x) = \frac{p^\pi(a)}{Z(x)} \exp\!\Bigl[\tfrac{1}{\lambda} Q^\pi(x,a)\Bigr]\]

where \(\lambda\) is a temperature parameter trading off return against policy complexity, and \(p^\pi(a)\) is the marginal action distribution. This looks like standard Boltzmann exploration, but there is an extra “complexity penalty” term \(\log p^\pi(a)\) that favours actions the agent already tends to take.

Now, to get curiosity, they add a second objective: maximise the predictive power of the agent’s behaviour, measured as the mutual information between the current state-action pair and the next state, \(I(\{X_t, A_t\}; X_{t+1})\). The optimal policy becomes:

\[\pi_{\text{opt}}(a \mid x) \propto p^\pi(a) \exp\!\Bigl[\tfrac{1}{\lambda}\bigl(D_{\text{KL}}[p(X_{t+1}|x,a) \| p^\pi(X_{t+1})] + \alpha\, Q^\pi(x,a)\bigr)\Bigr]\]

The first term in the exponent drives exploration: prefer actions whose consequences are maximally informative about the next state, i.e. actions that push the transition distribution far from the average. The second term drives exploitation as usual. The parameter \(\alpha\) controls how hungry the agent is for extrinsic reward versus curiosity — Still and Precup suggest you could set it by the robot’s battery level, which is a nice touch.

So that’s cool. Exploration vs. exploitation emerges from the optimisation rather than being bolted on as an \(\epsilon\)-greedy hack or whatever. Even as \(\lambda \to 0\) (deterministic policy), the agent still balances both drives.

4 Intrinsically motivated RL via salient events

This one (Chentanez, Barto, and Singh 2004) comes from more of a developmental-psychology direction. Instead of information-theoretic quantities, Singh, Barto, and Chentanez start from the observation that animals find certain events intrinsically salient — unexpected changes in light, sound, or tactile sensation trigger phasic dopamine responses that diminish with familiarity. Toddlers do this; they play with things until the playing gets boring (or injurious), then they find something new to play with (or burst into tears).

Their agent operates in the options framework (semi-Markov decision processes). When it first encounters a salient event \(e\) — say, a light turning on — it creates an option \(o_e\): a temporally extended skill for reproducing that event. The intrinsic reward for salient event \(e\) at time \(t+1\) is:

\[r^i_{t+1} = \tau\bigl[1 - P^{o_e}(s_{t+1} | s_t)\bigr]\]

where \(P^{o_e}\) is the learned option model’s prediction of reaching the salient state, and \(\tau\) is a scaling constant. So reward is proportional to the prediction error of the learned skill model: novel events yield high intrinsic reward, which fades as the option model improves and the event becomes predictable. The agent gets bored.

Architecturally, the agent uses intrinsic reward to update a behaviour action-value function \(Q_B\) via Q-learning, in parallel with updating the individual option policies \(Q^o\) via intra-option learning. Intrinsic reward drives skill acquisition (which options to practise); extrinsic reward drives task performance (which options to deploy). The two reward streams are additive but architecturally separated.

They illustrate this with some intuitive “playroom” experiments. The agent discovers a developmental sequence: it first masters simple skills (light on, light off), gets bored with them as intrinsic reward diminishes, then moves on to harder compound skills (turning on music requires light and finding and pressing a block). The hierarchy self-organises from the interaction of curiosity and prediction improvement, without any curriculum. And when an extrinsic task finally arrives (make the toy monkey cry — a 14-step procedure!), the intrinsically motivated agent solves it dramatically faster than a purely extrinsic learner, because it has already assembled the prerequisite skill library. It has been dicking around productively, as toddlers do.

Whilst I do not find the formalism here very satisfying, I find the resulting agents intensely personally relatable.

5 Novelty search

Novelty search (Lehman and Stanley 2011) throws out the objective function entirely. (Down with utility!!) Instead of rewarding fitness, it rewards behavioural novelty. Define a behaviour characterisation \(b(\theta)\) for each individual \(\theta\) in an evolutionary population, and maintain an archive \(\mathcal{A}\) of previously encountered behaviours. The novelty score is the average distance to the \(k\)-nearest neighbours in the archive:

\[\rho(b) = \frac{1}{k}\sum_{i=1}^{k} \|b - b_i\|\]

where \(b_1, \dots, b_k\) are the \(k\) nearest archived behaviours. Selection pressure pushes the population to keep exploring new regions of behaviour space (not genotype space, nor fitness space).

This sounds absurd — how can you solve problems without trying to? — and I think it is a bit absurd, at least in the general case. But in deceptive fitness landscapes, pursuing the objective directly leads you into local optima; novelty search escapes deception because it doesn’t care about the objective at all; it just keeps expanding the frontier of what’s been tried. And empirically, in maze navigation and other deceptive domains, novelty search often finds the goal faster than objective-driven search, precisely because the goal is reachable only via stepping stones that don’t look like progress.

Q: how is this “novelty search” different from the curiosity drive? I think the answer is that it operates at the population level rather than within a single agent’s lifetime. The “reward” is behavioural diversity, which is an evolutionary analogue of the compression-progress idea — except that the “model” being improved is the archive’s coverage of behaviour space.

6 Empowerment

I have a whole page on empowerment. tl;dr: Empowerment is the channel capacity between an agent’s actions and its future sensory states,

\[\mathfrak{E}(s) = \max_{p(a)} I(A;\, S’)\]

where the max is over action distributions and the mutual information measures how much the agent can influence its own future. An empowerment-maximising agent keeps its options open: it avoids dead ends, gravitates toward states with many reachable futures, and tends to acquire resources and resist shutdown.

7 Interactivity

Lewandowski et al. (2025) propose interactivity as an intrinsic motivation objective that swaps Shannon information for algorithmic (Kolmogorov) information:

Interactivity is similar to previously considered intrinsic motivation objectives (Chentanez, Barto, and Singh 2004; Schmidhuber 2010), and specifically predictive information (Bialek, Nemenman, and Tishby 2001b; Still and Precup 2012). However, interactivity uses algorithmic information rather than Shannon information, which can operate directly on individual sequences rather than requiring probability distributions. This sequence-based formulation provides a natural framework for continual adaptation, in which an agent’s behaviour is treated as an individual sequence.

The key quantity is the difference between the algorithmic complexity of future behaviour with and without conditioning on past experience. This sidesteps a real limitation of the Shannon-information approaches: they need well-defined probability distributions, which may not exist for a single agent living a single non-stationary lifetime. Algorithmic information operates directly on individual sequences, which makes it a more natural fit for continual learning. Whether this is practically computable is another question, but as a theoretical foundation I think it’s heading in a good direction.

Interestingly this is a closely related setting to differentiable swarm automata.

8 Reachability

Krakovna et al. (2019)

9 Attainable utility preservation

A. M. Turner, Hadfield-Menell, and Tadepalli (2020)

10 ICCEA power

Heitzig and Potham (2025) go for informationally and cognitively constrained effective autonomous (ICCEA) power — “essentially how many goals a human can freely choose to reach with more or less certainty, given their information, cognitive capabilities, and others’ behavior.”

It is built in three steps: defining what it means to achieve a goal, adjusting for human bounded rationality, and aggregating the achievement ability across all possible goals into one number. Rather than trying to capture the full range of subtle aspects of existing notions of human power, we focus on those aspects we believe a robot can robustly infer from the structure of its world model, encoded in state and action sets, transition kernel, and observation functions. Since we want to incentivize the robot to remove constraints and uncertainties, share information, make commitments, and improve human cognition, coordination, and cooperation, our power metric will also depend on \(r\)’s model of human decision making.

11 Active inference and free energy minimization (Friston)

Karl Friston’s free energy principle looks, from a certain angle, like an intrinsic motivation theory with the boldest possible scope: all adaptive behaviour is (variational) free energy minimization. I have reservations about some of the stronger claims here, but the connection to the above is real enough that it would be remiss to skip it.

An agent maintains a generative model \(p(\tilde{s}, \vartheta \mid m)\) of how sensory data \(\tilde{s}\) arise from hidden causes \(\vartheta\), and a recognition density \(q(\vartheta \mid \mu)\) parametrized by internal brain states \(\mu\). The variational free energy is

\[F = -\langle \ln p(\tilde{s}, \vartheta \mid m)\rangle_q + \langle \ln q(\vartheta \mid \mu)\rangle_q\]

which is the negative ELBO from variational Bayesian inference. Minimizing \(F\) with respect to \(\mu\) (perception) tightens the approximate posterior. The unique selling point of active inference is that the agent can also minimize \(F\) with respect to actions \(a\) — by changing the world to match its predictions, rather than only changing its beliefs to match the world.

So where Schmidhuber’s agent seeks compression progress (surprise that becomes learnable), Friston’s agent seeks to minimize surprise outright. This, as many people have noticed, sounds like a recipe for the “dark room problem” — just go sit in a dark room where nothing surprising ever happens, and you’ll minimize free energy perfectly. Friston’s response, as far as I understand it, is something like the generative model encodes priors over expected sensory states (homeostatic setpoints, essentially), so a hungry agent predicts it will eat, and acts to make that prediction true. The “intrinsic motivation” is already baked into the prior.

I find this answer unsatisfying — it feels like it relocates the hard problem into the prior rather than solving it — but reasonable people disagree.

Schmidhuber pointed out the tension in Schmidhuber (2010): Friston’s agents want to suppress prediction error, while curiosity-driven agents want to seek it out (then reduce it through learning). In active inference, “perception tries to suppress prediction error by adjusting expectations […] while action tries to suppress prediction error by changing the signals being predicted.” This is stabilizing, not exploring. A curious agent, by contrast, is motivated to leave the dark room because novel environments offer compression progress.

Hafner et al. (2022) attempt a reconciliation by showing that both action and perception can be cast as divergence minimization — with different target distributions. Under their formulation, active inference and intrinsic motivation objectives like empowerment and information gain emerge as special cases of the same variational framework, differing only in which KL divergence we minimise and which distribution we hold fixed.

I need to read that paper more deeply. It sounds like the right way to think about it: the free energy principle is less a competing intrinsic motivation theory and more a very general variational language in which many of the other theories can be expressed. Whether it adds predictive power beyond that expressiveness is still debated.

12 What connects these all?

Most of these formalisations are rearrangements of the same few information-theoretic building blocks. Curiosity rewards maximise \(I(\text{past};\text{future})\) or its time derivative. Empowerment maximises \(I(\text{actions};\text{future states})\). Policy complexity penalties minimise \(I(\text{states};\text{actions})\). Novelty search maximises coverage in behaviour space, which is an implicit entropy maximisation. Free energy minimization is the negative ELBO, i.e. it’s variational inference with aspirations.

The choice of which mutual information to maximise (or minimise, or differentiate) reflects different assumptions about what makes an agent good at learning. Schmidhuber’s agent wants to get better at predicting; Still and Precup’s agent wants its behaviour to be informative; empowerment-seeking agents want causal influence; novelty-searching populations want diversity. I don’t think these are really competing theories so much as different projections of a common intuition: an agent that can’t yet solve any particular problem should spend its time becoming the kind of agent that could solve many problems. Which is, come to think of it, also my career strategy.

13 Cousin formalism: Bayesian optimisation

If the explore/exploit trade-off above rings a bell from a completely different context, it should. Adaptive design of experiments (a.k.a. “Bayesian optimisation,” though I refuse to call it that) faces the same structural problem: you have an expensive-to-evaluate black-box function, a surrogate model (usually a Gaussian process), and you need to decide where to sample next. The acquisition function — expected improvement, upper confidence bound, entropy search, etc. — is doing exactly the work of an intrinsic motivation signal: it tells you which input is most worth evaluating, given what you already know.

The resemblance to curiosity-driven RL is right there in front of our eyes. In both cases, the agent is managing a posterior over a world model and choosing actions to maximise some information-theoretic quantity (information gain, expected model improvement, predictive variance reduction). Still and Precup’s KL-divergence curiosity term and the entropy-search acquisition function are practically the same object in different notation.

I thought I could summarise the differences compactly, but instead I persuaded myself that this is maybe slightly deeper than I thought it was. Bayesian optimisation typically assumes the phenomenon comes from some known family — often a GP prior, i.e. something like a smoothness assumption over an input space with known dimensionality. The goal is to find a specific optimum of a specific function. Intrinsic motivation, by contrast, is not trying to optimise any particular function; it is trying to produce an agent that is generically competent in an environment whose structure is largely unknown. There is no fixed acquisition target, and the “surrogate model” is the agent’s entire world model, which may have to deal with non-stationarity, partial observability, and its own actions changing the thing it is modelling. But what is “generic competence” anyway? Under some choice of world and competencies, I think these formalisms collapse into one another.

Let us ignore that for now. So: Bayesian optimisation is the well-behaved cousin who went to a good school and has a clear objective. Intrinsic motivation is the feral version of the same impulse, operating without the comforting assumption that you know what you are looking for. But the maths is close enough that insights flow in both directions, and I think there is unexploited territory in making the connection tighter.

🚧TODO🚧

14 Where did you hide the bodies?

One thing that bugs me about all of the above: none of these formalisations have anything to say about the physical cost of curiosity. A real agent — biological or robotic — runs on a finite energy budget. You cannot maximise compression progress if you are starving. You cannot seek out novel states if your battery is flat. Every bit of mutual information you compute or action you explore dissipates heat and consumes free energy in the boring, thermodynamic sense.

This is something we would need to actually model to explain why real organisms have drives that compete with curiosity (hunger, fatigue, thermoregulation), and the interaction between those drives and the exploratory ones is arguably where most of the interesting behaviour comes from. A toddler that explored with no metabolic constraints would be a very different beast from an actual toddler, who explores in bursts between naps and snacks.

There are a few threads that start to connect the information-theoretic formalisations above to thermodynamic reality. Susanne Still (the same person as in the curiosity-in-RL paper) showed that predictive information has a direct thermodynamic cost: any system that predicts its future inputs can in principle reduce the thermodynamic dissipation from driving its states, but only up to a bound set by the mutual information between its internal states and the future (Still et al. 2012). So there is a physical exchange rate between prediction and energy expenditure, which is exactly the kind of thing you would need to bridge intrinsic motivation to metabolic constraints. Pedro Alejandro Ortega and Braun (2011) take a related approach via bounded rationality, treating the KL divergence between a policy and a default (i.e. the “complexity penalty” in Still and Precup’s framework) as a computational or thermodynamic cost, yielding a free-energy-style objective where the temperature parameter literally controls how much work the agent can afford to do. But as far as I can tell, nobody has yet built a full intrinsic motivation framework where the agent’s curiosity drive is explicitly modulated by its energy budget in a thermodynamically principled way. The pieces are all there; someone should put them together. 🚧TODO🚧.

15 References

Abel, Dong, Lyle, et al. 2025. “Plasticity as the Mirror of Empowerment.” In Advances In Neural Information Processing Systems.

Berrueta, Pinosky, and Murphey. 2024. “Maximum Diffusion Reinforcement Learning.” Nature Machine Intelligence.

Bialek, Nemenman, and Tishby. 2001a. “Complexity Through Nonextensivity.” Physica A: Statistical and Theoretical Physics.

———. 2001b. “Predictability, Complexity, and Learning.” Neural Computation.

Blau, Ott, and Ramos. 2019. “Bayesian Curiosity for Efficient Exploration in Reinforcement Learning.”

Chentanez, Barto, and Singh. 2004. “Intrinsically Motivated Reinforcement Learning.” In Advances in Neural Information Processing Systems.

Chu, Rule, Goddu, et al. 2025. “Fun Isn’t Easy: Children Selectively Manipulate Task Difficulty When “Playing for Fun” Versus “Playing to Win”.” Developmental Psychology.

Du, Kosoy, Dayan, et al. 2023. “What Can AI Learn from Human Exploration? Intrinsically-Motivated Humans and Agents in Open-World Exploration.” In.

Godfrey-Smith. 2000. “The Replicator in Retrospect.” Biology and Philosophy.

Hafner, Ortega, Ba, et al. 2022. “Action and Perception as Divergence Minimization.”

Hayashi, and Takahashi. 2025. “Universal AI Maximizes Variational Empowerment.”

Heitzig, and Potham. 2025. “Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power.”

Krakovna, Orseau, Kumar, et al. 2019. “Penalizing Side Effects Using Stepwise Relative Reachability.”

Lehman. 2007. “Evolution Through the Search for Novelty.”

Lehman, Gordon, Jain, et al. 2022. “Evolution Through Large Models.”

Lehman, and Stanley. 2011. “Abandoning Objectives: Evolution Through the Search for Novelty Alone.” Evolutionary Computation.

———. 2013. “Evolvability Is Inevitable: Increasing Evolvability Without the Pressure to Adapt.” PLoS ONE.

Lewandowski, Ramesh, Meyer, et al. 2025. “The World Is Bigger: A Computationally-Embedded Perspective on the Big World Hypothesis.” In.

Lidayan, Du, Kosoy, et al. 2025. “Intrinsically-Motivated Humans and Agents in Open-World Exploration.”

Mora, and Bialek. 2011. “Are Biological Systems Poised at Criticality?” Journal of Statistical Physics.

Oguntola, Campbell, Stepputtis, et al. 2023. “Theory of Mind as Intrinsic Motivation for Multi-Agent Reinforcement Learning.” arXiv.org.

Omohundro. 2018. “The Basic AI Drives.” In Artificial Intelligence Safety and Security.

Ornia, Bishop, Dyer, et al. 2025. “Emergent Risk Awareness in Rational Agents Under Resource Constraints.” In.

Ortega, Pedro Alejandro, and Braun. 2011. “Information, Utility and Bounded Rationality.” In Proceedings of the 4th International Conference on Artificial General Intelligence. AGI’11.

Ortega, Pedro A., and Braun. 2013. “Thermodynamics as a Theory of Decision-Making with Information-Processing Costs.” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

Ramírez-Ruiz, Grytskyy, Mastrogiuseppe, et al. 2024. “Complex Behavior from Intrinsic Motivation to Occupy Future Action-State Path Space.” Nature Communications.

Ringstrom. 2023. “Reward Is Not Necessary: How to Create a Modular & Compositional Self-Preserving Agent for Life-Long Learning.”

Schmidhuber. 2010. “Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010).” IEEE Transactions on Autonomous Mental Development.

Singh, Lewis, Barto, et al. 2010. “Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective.” IEEE Transactions on Autonomous Mental Development.

Still, and Precup. 2012. “An Information-Theoretic Approach to Curiosity-Driven Reinforcement Learning.” Theory in Biosciences.

Still, Sivak, Bell, et al. 2012. “Thermodynamics of Prediction.” Physical Review Letters.

Tarsney. 2025. “Will Artificial Agents Pursue Power by Default?”

Taylor, and Dorin. 2020. Rise of the Self-Replicators: Early Visions of Machines, AI and Robots That Can Reproduce and Evolve.

Turner, Alexander Matt, Hadfield-Menell, and Tadepalli. 2020. “Conservative Agency via Attainable Utility Preservation.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. AIES ’20.

Turner, Alex, Smith, Shah, et al. 2021. “Optimal Policies Tend To Seek Power.” In Advances in Neural Information Processing Systems.

Wagner. 2005. Robustness and Evolvability in Living Systems. Princeton Studies in Complexity.

———. 2008. “Robustness and Evolvability: A Paradox Resolved.” Proc Biol Sci.