AI disempowerment of humans
Races to the bottom in human relevance
2021-09-20 — 2025-09-11
Wherein human agency is recorded as being eroded by AI substitution of labor, cognition, and culture, and feedback loops and institutional lock‑in are described as constraining reversal.
On the prospect of human domestication via AI. Is that bad? It seems frightening, but I suppose we could imagine virtuous versions of it.
The iconic essay in this domain was Paul Christiano’s 2019 essay, What failure looks like:
Amongst the broader population, many folk already have a vague picture of the overall trajectory of the world and a vague sense that something has gone wrong. There may be significant populist pushes for reform, but in general these won’t be well-directed. Some states may really put on the brakes, but they will rapidly fall behind economically and militarily, and indeed “appear to be prosperous” is one of the easily-measured goals for which the incomprehensible system is optimising.
Amongst intellectual elites there will be genuine ambiguity and uncertainty about whether the current state of affairs is good or bad. People really will be getting richer for a while. Over the short term, the forces gradually wresting control from humans do not look so different from (e.g.) corporate lobbying against the public interest, or principal-agent problems in human institutions. There will be legitimate arguments about whether the implicit long-term purposes being pursued by AI systems are really so much worse than the long-term purposes that would be pursued by the shareholders of public companies or corrupt officials.
We might describe the result as “going out with a whimper.” Human reasoning gradually stops being able to compete with sophisticated, systematised manipulation and deception which is continuously improving by trial and error; human control over levers of power gradually becomes less and less effective; we ultimately lose any real ability to influence our society’s trajectory.
See Kulveit et al. (2025) for a modernisation, and accompanying web page
People talk about “gradual” disempowerment to distinguish it from sudden apocalypse, but I see no reason for such disempowerment to be all that gradual. Both AI-led economic transition and epistemic transition can have rather sudden effects.
We can distinguish a few major lenses for analyzing such a drift away from human agency. The authors use different metaphors (“the narrow corridor,” “lock-in,” “cultural feedback loops”), but they seem to be circling the same underlying worry: that once AI systems substitute for human labour, cognition, and culture, the implicit bargains that kept societies responsive to human needs will dissolve.
What failure looks like was the first post I can recall that claimed we could “go out with a whimper” as systems gradually optimize for goals misaligned with human flourishing, eroding our control without triggering apocalypse. That essay became the intellectual seed for much of what followed, inspiring formalizations like Kulveit et al. (2025)’s ‘gradual disempowerment’ report.
Most of the newer papers (Bullock, Hammond, and Krier 2025; Kulveit et al. 2025; MacInnes, Garfinkel, and Dafoe 2024; Qiu et al. 2025) can be read as elaborations of some common core ideas:
- Feedback loops — when AI substitutes for human activity, the signals that once tethered institutions to people’s welfare (taxes, consumer demand, cultural participation) attenuate.
- Selection pressures — in markets and geopolitics, states or firms that lean harder into AI outcompete those that restrain themselves, even if the outcome degrades welfare.
- Lock-in — once these shifts reach a threshold, human preferences no longer constrain system dynamics, making reversal difficult or impossible. See e.g. The Lock-in Hypothesis site.
Each author picks a different domain lens to illustrate how this skeleton plays out.
- Economy, culture, states — Kulveit et al. show how substitution in each domain interacts, producing reinforcing spirals that weaken human leverage.
- The narrow corridor — Bullock, Hammond, and Krier (2025) draws on (Acemoglu 2020) to ask whether AI tips societies toward despotic over-control or an absent-Leviathan collapse.
- International competition — The MacInnes paper (MacInnes, Garfinkel, and Dafoe (2024)) argues that anarchy plus new tech pushes states toward low-welfare equilibria; Patell (2025) responds that cooperative bulwarks are possible but fragile.
- Epistemics and culture — The Lock-in Hypothesis (Qiu et al. 2025) models AI-human feedback loops that shrink diversity and entrench false beliefs. I think of this one as disrupting the ecology of mind, maybe via AI persuasion catastrophes
- Institutional transparency — Andrew Critch’s Critch, Dennis, and Russell (2022) open-source game-theory work ties into this by showing how visible rulesets can self-fulfil, sometimes toward cooperation, sometimes collapse.
The LessWrong and Alignment Forum essays (e.g. Multipolar Failure and Critch’s Boundaries) give the informal language — RAAPs, lock-in, multipolar traps — that the academic papers then mathematize. Andrew Critch’s GradualDisempowerment.ai portal is explicitly meant to bridge the informal and the formal. Meanwhile, popular pieces like Spirals of Delusion in Foreign Affairs translate the motif into a geopolitical idiom.
What unifies them is a shift from “sudden catastrophic takeover” to “systemic drift” as the object of risk analysis.
1 Incoming
- GradualDisempowerment.ai: Systemic Existential Risks from Incremental AI Development (Critch, Dennis, and Russell 2022)
- Spirals of Delusion: How AI Distorts Decision-Making and Makes Dictators More Dangerous (not convinced about this one tbh)
- Post-AGI Civilizational Equilibria Workshop | Vancouver 2025
- Andrew Critch, «Boundaries» Sequence
- The Intelligence Curse
- Large AI models are cultural and social technologies
- The Economics of Transformative AI | NBER