Embedded agency

What about agents that live in the world?

2018-10-23 — 2026-02-06

Wherein the doctrine of infinite‑compute agents is surveyed, AIXI‑like worlds being shown to spawn Löbian self‑reference as agents simulate one another, and the matter is set aside for want of finite budgets.

agents

causality

compsci

evolution

extended self

game theory

mind

stringology

“…I am sure that one of your very early philosophers came to the conclusion that a fully competent mind, from a study of one fact or artifact belonging to any given universe, could construct or visualize that universe, from the instant of its creation to its ultimate end?”

“Yes. At least, I have heard the proposition stated, but I have never believed it possible.”

“It is not possible simply because no fully competent mind ever has existed or ever will exist. A mind can become fully competent only by the acquisition of infinite knowledge, which would require infinite time as well as infinite capacity.”

— E. E. “Doc” Smith (1950) First Lensman,

I recently reviewed the famed Embedded Agency post for the PIBBSS x ILLIAD research residency. I’ve been meaning to look into this forever, because it sounds important.

That post was not quite what I thought it was. Specifically, it’s about exotic problems that stem from infinite-compute limit cases, i.e. understanding inference in terms of computability and less so in terms of computational complexity or practical implementation, and only indirectly causal embeddedness.

So here be dragons! Broadly speaking, Gödel-style incompleteness and Löb’s-theorem-style self-referential paradoxes arise all over the place when we try to make agents that reason about themselves or each other. If every agent is doing full-Bayesian AIXI-type experiments, then in practice it means (AFAICT) that every agent contains many possible hypotheses about the universe, each of which is a complete simulation of the universe (or at least its likelihood, which is basically the same thing). This is already a very weird setting.

BUT WAIT, observe the embedded-agency authors, what if the world, furthermore, includes other agents which are also infinite-compute agents that also contain many simulated universes? Does stuff get weird then? Fuck yeah it does.

I guess the key takeaway is that once you’ve swallowed AIXI, the weirdness shows up in lots of ways, and very rapidly. For example, modelling myself breaks in especially horrible ways if I’m an agent with boundless compute. Modelling others is totally cursed.

This stuff is all super fun, but… it doesn’t feel close to the problems I’m excited about.

AIXI is already batshit insane as a model for the world, and I’m not that keen on it as a short-horizon practical tool (as opposed to a satisfying and elegant formalism, which it is). If we’ve learned one thing about AI recently, it is that compute economics is everything; setting the compute budget to infinity, which is (informally) what AIXI does, feels like a fascinating thought experiment, but not especially generative.

If I wanted to care about agency, I would rather start from a different set of examples; for example what if the agents were made of the same stuff as the rest of the world?.

To do otherwise is to be dualist about the mind, and the problems we run into resemble the ones earlier generations of dualists ran into, which we would generally avoid rather than “solve”. For example, does an infinitely powerful agent have free will? Can a Universal Intelligence make an inferential challenge so big that even she cannot solve it? How many Universal Intelligences can dance on the head of a pin? etc.

I’ll leave this as a placeholder for now, but I’m not pursuing this line of research at the moment.

That said, if we want a cool application of Löb’s theorem, well explained, this is a great starting point.

1 References

Bruineberg, Dolega, Dewhurst, et al. n.d. “The Emperor’s New Markov Blankets.”

Da Costa, Friston, Heins, et al. 2021. “Bayesian Mechanics for Stationary Processes.” arXiv:2106.13830 [Math-Ph, Physics:nlin, q-Bio].

Demski, and Garrabrant. 2020. “Embedded Agency.”

Garrabrant, Benson-Tilsen, Critch, et al. 2020. “Logical Induction.”

Herrmann. 2023. “Naturalizing Decision Theory.”

Huh, Cheung, Wang, et al. 2024. “The Platonic Representation Hypothesis.”

Kirchhoff, Parr, Palacios, et al. 2018. “The Markov Blankets of Life: Autonomy, Active Inference and the Free Energy Principle.” Journal of The Royal Society Interface.

Levine. 2018. “Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review.”

Lewandowski, Ramesh, Meyer, et al. 2025. “The World Is Bigger: A Computationally-Embedded Perspective on the Big World Hypothesis.” In.

Orseau, and Ring. 2012. “Space-Time Embedded Intelligence.” In Artificial General Intelligence. Lecture Notes in Computer Science.

Ortega, and Braun. 2013. “Thermodynamics as a Theory of Decision-Making with Information-Processing Costs.” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

Ramstead, Hesp, Tschantz, et al. 2021. “Neural and Phenotypic Representation Under the Free-Energy Principle.” Neuroscience and Biobehavioral Reviews.

Richens, Abel, Bellot, et al. 2025. “General Agents Contain World Models.” In ICML.

Rosas, Boyd, and Baltieri. 2025. “AI in a Vat: Fundamental Limits of Efficient World Modelling for Agent Sandboxing and Interpretability.” In.

Wolpert, and Kinney. 2024. “A Stochastic Model of Mathematics and Science.” Foundations of Physics.