Boundaries and blankets
where to draw the border around the system
2026-02-05 — 2026-04-24
Wherein the Markov blanket, a tool of graphical models, is conscripted as a theory of selfhood, found satisfiable by rocks and hurricanes alike, and subjected to workshop scrutiny without consensus being reached.
If we have a big complicated system — a brain, a corporation, an ecosystem, an AI agent loop — where do we draw the line around “the thing”?
One formalisation is the Markov blanket, a piece of standard graphical models machinery which has recently been conscripted into service as a theory of selfhood, agency, and alignment. I want to walk through how that happened, where I think it goes wrong, and what the recent formalising-boundaries workshop programme has done about it.
1 Markov blankets
Pick a node \(X\) in a graphical model. Its Markov blanket \(\operatorname{MB}(X)\) is the smallest set of other nodes such that \(X\) is conditionally independent of everything else given \(\operatorname{MB}(X)\):
\[ X \perp\!\!\!\perp \bigl(V \setminus (\{X\} \cup \operatorname{MB}(X))\bigr) \;\Big|\; \operatorname{MB}(X) \]
where \(V\) is the full set of nodes. In an undirected graph this is just the neighbours of \(X\). In a DAG it is \(X\)’s parents, \(X\)’s children, and the other parents of \(X\)’s children (the “co-parents” or “spouses” — nodes that share a child with \(X\)).
If we know everything in the blanket, then learning anything else about the rest of the universe tells us nothing new about \(X\). The blanket is an informational shield. Pearl’s term for it is a statistical boundary condition: everything the outside world can tell us about \(X\) has to pass through the blanket first.
2 Friston: blankets as selfhood
Karl Friston and collaborators (Kirchhoff et al. 2018) take this statistical gadget and do something ambitious with it: they propose that having a Markov blanket is roughly what it means to be a “thing” that persists in an environment. A cell membrane is a blanket separating internal states from the outside world; skin separates physiological states from the weather; the boundary of a firm separates internal operations from the market.
They further partition the blanket itself into sensory states (outside influences inside) and active states (inside influences outside). Write \(\mu\) for internal states, \(\eta\) for external, \(s\) for sensory, \(a\) for active. The blanket is \(b = s \cup a\), and the claim is:
\[ \mu \perp\!\!\!\perp \eta \mid b, \qquad \dot{\mu} = f(\mu, s), \qquad \dot{a} = g(\mu, a) \]
Internal states evolve based only on themselves and sensory input; active states evolve based only on internal states and themselves. So far this is just the conditional independence property dressed up in dynamical systems notation.
The substantive claim comes from the free energy principle: anything that persists — anything that doesn’t just dissolve into thermal noise — must act as if it is minimising surprise at its blanket boundary. Note the slippage: we have gone from a static graphical model property to a claim about the flow of a stochastic differential equation. Whether the machinery of graphical models straightforwardly applies in that continuous-time setting is part of what makes the whole free-energy research programme hard to evaluate.
3 The trouble with blankets
Bruineberg et al. (Bruineberg et al. 2022) raise the obvious worry: we can always find a Markov blanket partition if we look hard enough. Any partition of states into “inside” and “outside” with some intermediary layer will satisfy the conditional independence criterion, given sufficiently accommodating dynamics. So the existence of a blanket does not, by itself, tell us that the thing inside is autonomous in any interesting sense. A hurricane has a blanket. A rock has a blanket. My coffee cup, right now, has a blanket.
This is the central problem, and it is the same problem we run into with agency more generally: the formal criterion is too easy to satisfy. What we want is some way to say that this blanket is the interesting one — that this particular partition carves the world at its joints. Choosing a Markov blanket partition is a modelling decision, the same way choosing which self to identify with is a modelling decision. The blanket formalism does not solve the boundary problem; it reframes it in graphical-model language. That reframing is useful — it lets us be precise about what information crosses the boundary, and it connects to actual mathematical tools — but the hard question of which partition to pick remains, and it is doing all the work.
4 Other ways of drawing the line
Several researchers are attacking the carving problem from outside the Friston paradigm.
Levin (2019) argues that the “self” in a developing organism is scale-free: cells, tissues, organs, and the whole organism each operate as agents with their own goals, nested inside each other. There is no single correct blanket; there are blankets at every scale, and the interesting question is how they compose. (Cf. multi-agent selves, which self.)
Fontana and Buss (1996) come at it from autopoiesis and formal chemistry. A barrier of objects is what allows bounded organisations to persist; the boundary isn’t just informational but constructive — it’s the set of processes that maintain themselves. This is a stronger condition than conditional independence, and possibly closer to what we actually mean when we say something is a “self.”
Lewandowski et al. (2025) consider what happens when the world is too big for the agent to model. If we can’t represent the whole environment, our effective Markov blanket is partly a function of our computational limitations — we’re not so much discovering the boundary as being forced into one by our own finitude. The boundary of the self is, in part, a boundary of affordance — cf causally embedded agency.
5 Critch: boundaries before preferences
The most sustained attempt at a formal theory comes from Andrew Critch, whose «Boundaries» sequence argues that standard utility theory has no formal concept of where the agent stops and the environment starts. In a game-theoretic model, the partition of the world into players — who has which action space, which observation space, which utility function — is given exogenously. We just declare it. That declaration is doing enormous work, and if we get it wrong (or if it is ambiguous, or if it shifts over time) then the whole decision-theoretic apparatus is sitting on sand.
Critch’s bet, as I understand it, is that boundaries are more fundamental than preferences — they should be the primitive we build up from, not something we derive from utility functions. Friston says: the blanket is wherever the conditional independence structure puts it. Critch says: we don’t even have the right formalism to ask the question properly within utility theory. Both are pointing at the same hole from opposite sides.
In Part 3a of the sequence, Critch proposes a specific formalisation. Partition the world’s variables into four groups — Viscera, Active boundary, Passive boundary, Environment — and treat this as a Bayesian network.1 The viscera are the guts of the agent, the states it has close to full causal control over. The active boundary A is influenced primarily by the viscera — interpretable as actions. The passive boundary P is influenced primarily by the environment — interpretable as perceptions. The boundary \(B = A \cup P\) is then a directed Markov blanket: it approximately d-separates V from E, with a discernible direction of inward (perception) and outward (action) information flow.
Infiltration is information leaking from the environment into the active boundary and viscera; exfiltration is information leaking from the viscera into the passive boundary and environment. When both are zero, we have a perfect boundary. In practice they are nonzero, and measuring how far we are from zero gives us a graded notion of boundary integrity — which is to say, an approximate, graded notion of agency.
VAPE is recognisably a descendant of Friston’s sensory/active partition, reframed in causal DAG language rather than stochastic differential equations. Whether the causal DAG framing is the right one is contested — in a model with complete state information, everything becomes trivially independent, so the blanket concept may need a non-causal or non-equilibrium reformulation.
The payoff Critch is after is in alignment. Many things we try to capture as “minimise side effects” or “avoid over-optimisation” can be stated more precisely as respect for boundaries: don’t infiltrate other agents’ viscera, don’t exfiltrate past their passive boundaries. The claim is that these coordination norms are not arbitrary — they transfer across species and scales of organisation, unlike preference-based norms which need to be elicited per agent.
But if we’re choosing who to become — making intertemporal decisions on behalf of a future person who may not share our current preferences — then the boundary of the agent is not stable over time. The self doing the optimising and the self being optimised-for are not obviously the same system, and AFAICT no current formalism handles that.
6 The formalising-boundaries workshops
Chris Lakin, Manuel Baltieri, and Evan Miyazono organised a Conceptual Boundaries Workshop (Austin, February 2024), then a Mathematical Boundaries Workshop (April 2024, five days). Attendees included Critch, davidad, Scott Garrabrant, Abram Demski, and a mix of people from applied category theory, artificial life, and alignment — roughly 40/40/20 by Miyazono’s estimate. The conceptual workshop explored what we’d want from a boundaries formalism; the mathematical workshop tried to actually build one.
The main thing that went wrong, per the Topos Institute’s post-workshop writeup, is that participants couldn’t agree on the underlying ontology. One person was thinking in terms of sets and Cartesian products; another in terms of fibrations; davidad in terms of polynomial functors. Each translation carried implicit commitments about what “state”, “boundary”, and “dynamics” meant, and these commitments were hard to make explicit, let alone reconcile. This is a familiar problem from graphical models generally — Bayesian nets, factor graphs, and Markov random fields all express the same independences but carry different ontological baggage, and people develop strong loyalties to their preferred notation. At the boundaries workshop it was worse, because the objects being modelled (agents, selves, cells) are themselves contested.
Breakout sessions explored a cocategorical formalism (specifying things via the wholes they participate in, rather than by composing parts) and an attempt to formalise gliders as non-deterministic closed dynamical systems. AFAICT neither attempt produced a formalism that the full group adopted afterwards.
The boundary problem, then, appears hard in a specific way: not that we lack mathematical tools, but that we lack agreement on what the tools should be tools for. The statistical tradition wants conditional independence; the game-theoretic tradition wants player partitions; the category-theoretic tradition wants compositional structure; the autopoietic tradition wants self-maintaining processes. All reasonable desiderata, but they pull in different directions. Nobody has a framework that satisfies more than one or two simultaneously.
The LessWrong «Boundaries/Membranes» tag and compilation post collect the full archipelago of related work.
7 Wish list
A satisfying theory would, given a dynamical system and some criterion of interest (a loss function, a fitness measure, a utility), discover the partition that best predicts or controls the thing we care about, rather than impose one by fiat. That feels adjacent to learning graphical structure from data and maybe causal discovery, but I do not know of anyone who has made the connection rigorous. It would also have an account of nesting — blankets within blankets, cells within organs within firms within polities — which looks like it should connect to coarse-graining and renormalisation, but I have not seen that analogy do real work yet.
And it would, I hope, let us say something precise about AI agents. Right now we define their boundaries architecturally — this is the agent, that is the tool, here is the environment. But an LLM with tool access that can read and write its own system prompt has a rather different blanket than one that cannot, even if the architecture diagram looks identical. Which of these boundaries matters for alignment? IMO both, in ways we don’t yet have the formalism to state.
8 Incoming
- Jan Kulveit, The Pando Problem
9 References
Footnotes
Chris Lakin’s conceptual explainer is a good companion read.↩︎
