AI disempowerment of humans

Races to the bottom in human relevance

2021-09-20 — 2025-09-03

AI safety
economics
faster pussycat
innovation
language
machine learning
mind
neural nets
technology
UI
Figure 1

Human domestication via AI. Is that bad? It seems frightening, but I suppose we could imagine virtuous versions of it.

The iconic essay in this domain was Paul Christiano’s 2019 essay, What failure looks like:

Amongst the broader population, many folk already have a vague picture of the overall trajectory of the world and a vague sense that something has gone wrong. There may be significant populist pushes for reform, but in general these won’t be well-directed. Some states may really put on the brakes, but they will rapidly fall behind economically and militarily, and indeed “appear to be prosperous” is one of the easily-measured goals for which the incomprehensible system is optimising.

Amongst intellectual elites there will be genuine ambiguity and uncertainty about whether the current state of affairs is good or bad. People really will be getting richer for a while. Over the short term, the forces gradually wresting control from humans do not look so different from (e.g.) corporate lobbying against the public interest, or principal-agent problems in human institutions. There will be legitimate arguments about whether the implicit long-term purposes being pursued by AI systems are really so much worse than the long-term purposes that would be pursued by the shareholders of public companies or corrupt officials.

We might describe the result as “going out with a whimper.” Human reasoning gradually stops being able to compete with sophisticated, systematised manipulation and deception which is continuously improving by trial and error; human control over levers of power gradually becomes less and less effective; we ultimately lose any real ability to influence our society’s trajectory.

See Kulveit et al. (2025) for a modernisation.

People talk about “gradual” disempowerment to distinguish it from sudden apocalypse; but I see no reason for such disempowerment to be all that gradual. Both AI-lead economic transition and epistemic transition can have rather sudden effects.

We can distinguish a few major lenses for analyzing such a drift away from human agency. The authors use different metaphors (“the narrow corridor,” “lock-in,” “cultural feedback loops”), but they seem to me to be circling the same underlying worry: that once AI systems substitute for human labor, cognition, and culture, the implicit bargains that kept societies responsive to human needs will dissolve.

What failure looks like was the first post I can recall that claimed we could “go out with a whimper” as systems gradually optimize for goals misaligned with human flourishing, eroding our control without triggering apocalypse. That essay became the intellectual seed for much of what followed, inspiring formalizations like Kulveit et al. (2025) “gradual disempowerment” report.

Most of the newer papers (Bullock, Hammond, and Krier 2025; Kulveit et al. 2025; MacInnes, Garfinkel, and Dafoe 2024; Qiu et al. 2025) can be read as elaborations of a common core:

Each author picks a different domain lens to illustrate how this skeleton plays out.

The LessWrong and alignmentforum essays (e.g. Multipolar Failure Critch’s Boundaries) give the informal language — RAAPs, lock-in, multipolar traps — that the academic papers then mathematize. Andrew Critch’s GradualDisempowerment.ai portal is explicitly meant to bridge the informal and the formal. Meanwhile, popular pieces like Spirals of Delusion in Foreign Affairs translate the motif into a geopolitical idiom.

What unifies them is a shift from “sudden catastrophic takeover” to “systemic drift” as the object of risk analysis.

1 Incoming

2 References

Acemoglu. 2020. The Narrow Corridor: States, Societies, and the Fate of Liberty.
Barez, Friend, Reid, et al. 2025. Toward Resisting AI-Enabled Authoritarianism.”
Bullock, Hammond, and Krier. 2025. AGI, Governments, and Free Societies.”
Critch, Dennis, and Russell. 2022. Cooperative and Uncooperative Institution Designs: Surprises and Problems in Open-Source Game Theory.”
Fish, Gölz, Parkes, et al. 2025. Generative Social Choice.”
Kenton, Kumar, Farquhar, et al. 2023. Discovering Agents.” Artificial Intelligence.
Kulveit, Douglas, Ammann, et al. 2025. Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development.”
MacInnes, Garfinkel, and Dafoe. 2024. Anarchy as Architect: Competitive Pressure, Technology, and the Internal Structure of States.” International Studies Quarterly.
Patell. 2025. Cooperation as Bulwark: Evolutionary Game Theory and the Internal Institutional Structure of States.”
Qiu, He, Chugh, et al. 2025. The Lock-in Hypothesis: Stagnation by Algorithm.” In.