AI disempowerment of humans
Races to the bottom in human relevance
2021-09-20 — 2025-09-03
Suspiciously similar content
Human domestication via AI. Is that bad? It seems frightening, but I suppose we could imagine virtuous versions of it.
The iconic essay in this domain was Paul Christiano’s 2019 essay, What failure looks like:
Amongst the broader population, many folk already have a vague picture of the overall trajectory of the world and a vague sense that something has gone wrong. There may be significant populist pushes for reform, but in general these won’t be well-directed. Some states may really put on the brakes, but they will rapidly fall behind economically and militarily, and indeed “appear to be prosperous” is one of the easily-measured goals for which the incomprehensible system is optimising.
Amongst intellectual elites there will be genuine ambiguity and uncertainty about whether the current state of affairs is good or bad. People really will be getting richer for a while. Over the short term, the forces gradually wresting control from humans do not look so different from (e.g.) corporate lobbying against the public interest, or principal-agent problems in human institutions. There will be legitimate arguments about whether the implicit long-term purposes being pursued by AI systems are really so much worse than the long-term purposes that would be pursued by the shareholders of public companies or corrupt officials.
We might describe the result as “going out with a whimper.” Human reasoning gradually stops being able to compete with sophisticated, systematised manipulation and deception which is continuously improving by trial and error; human control over levers of power gradually becomes less and less effective; we ultimately lose any real ability to influence our society’s trajectory.
See Kulveit et al. (2025) for a modernisation.
People talk about “gradual” disempowerment to distinguish it from sudden apocalypse; but I see no reason for such disempowerment to be all that gradual. Both AI-lead economic transition and epistemic transition can have rather sudden effects.
We can distinguish a few major lenses for analyzing such a drift away from human agency. The authors use different metaphors (“the narrow corridor,” “lock-in,” “cultural feedback loops”), but they seem to me to be circling the same underlying worry: that once AI systems substitute for human labor, cognition, and culture, the implicit bargains that kept societies responsive to human needs will dissolve.
What failure looks like was the first post I can recall that claimed we could “go out with a whimper” as systems gradually optimize for goals misaligned with human flourishing, eroding our control without triggering apocalypse. That essay became the intellectual seed for much of what followed, inspiring formalizations like Kulveit et al. (2025) “gradual disempowerment” report.
Most of the newer papers (Bullock, Hammond, and Krier 2025; Kulveit et al. 2025; MacInnes, Garfinkel, and Dafoe 2024; Qiu et al. 2025) can be read as elaborations of a common core:
- Feedback loops — when AI substitutes for human activity, the signals that once tethered institutions to people’s welfare (taxes, consumer demand, cultural participation) attenuate.
- Selection pressures — in markets and geopolitics, states or firms that lean harder into AI outcompete those that restrain themselves, even if the outcome degrades welfare.
- Lock-in — once these shifts reach a threshold, human preferences no longer constrain system dynamics, making reversal difficult or impossible. See e.g. The Lock-in Hypothesis site.
Each author picks a different domain lens to illustrate how this skeleton plays out.
- Economy, culture, states — Kulveit et al. show how substitution in each domain interacts, producing reinforcing spirals that weaken human leverage.
- The narrow corridor — Bullock, Hammond, and Krier (2025) use (Acemoglu 2020) to ask whether AI tips societies toward despotic over-control or absent Leviathan collapse.
- International competition — MacInnes, Garfinkel, and Dafoe (2024) argue anarchy plus new tech pushes states toward low-welfare equilibria; Patell (2025) responds that cooperative bulwarks are possible, but fragile.
- Epistemics and culture — The Lock-in Hypothesis (Qiu et al. 2025) models AI-human feedback loops that shrink diversity and entrench false beliefs. I think of this one as disrupting the ecology of mind, maybe via AI persuasion catastrophes
- Institutional transparency — Critch, Dennis, and Russell (2022) open-source game theory work tie into this by showing how visible rulesets can self-fulfill, sometimes toward cooperation, sometimes collapse.
The LessWrong and alignmentforum essays (e.g. Multipolar Failure Critch’s Boundaries) give the informal language — RAAPs, lock-in, multipolar traps — that the academic papers then mathematize. Andrew Critch’s GradualDisempowerment.ai portal is explicitly meant to bridge the informal and the formal. Meanwhile, popular pieces like Spirals of Delusion in Foreign Affairs translate the motif into a geopolitical idiom.
What unifies them is a shift from “sudden catastrophic takeover” to “systemic drift” as the object of risk analysis.
1 Incoming
- GradualDisempowerment.ai: Systemic Existential Risks from Incremental AI Development (Critch, Dennis, and Russell 2022)
- Spirals of Delusion: How AI Distorts Decision-Making and Makes Dictators More Dangerous (not convinced about this one tbh)
- Post-AGI Civilizational Equilibria Workshop | Vancouver 2025
- Andrew Critch, «Boundaries» Sequence
- The Intelligence Curse
- Large AI models are cultural and social technologies
- The Economics of Transformative AI | NBER