Proposal: Mass Epistemic Risk from AI
2025-04-11 — 2025-08-31
Wherein the societal contagion threshold is shown to be tunable by AI persuasion, and agent‑based attacker–defender simulations are proposed to map protocol trade‑offs for resilience.
I am working up some proposals in AI safety at the moment, including this one.
This proposal is the “big picture” one, that should explain to a less-technical audience why I think AI persuasion is important to keep an eye on, and, in particular, the kinds of problems that, in my experience, people underestimate the importance of. In particular I argue that right now the kind of “weak” persuasion that we see AIs doing is already incredibly dangerous, because it opens us up to whiplash effects, cascading social change and all sorts of “heavy-tailed”, i.e. rare-but-nasty behaviour, as we pass various thresholds of AI technology, and this will worsen as we go on.
This is the first draft of a proposal that did not get accepted for funding, and as such I have not had time to make it in to a better proposal yet. Consdier it a sketch of the type of idea IÆm thinking about here, rather than a finished product.
Just as people outside the AI safety field are frequently lambasted by people inside the field for underestimating the significance of exponential or superexponential growth within AI capabilities despite copious evidence of its importance, I think that AI specialists are prone to underestimate exponential or superexponential shifts in mass human behaviour, despite copious evidence of its importance.
2 From Individual Persuasion to Systemic Phase Transition
Let’s walk through the inferential steps. We suspect readers are familiar with each step in isolation but may not have appreciated the compounding effect that new evidence suggests we face in the field.1
2.1 The Physics of Contagion May be Endogenously Explosive.
The spread of everything from memes to financial panic is well-described by self-exciting Hawkes processes. As Didier Sornette and others have modeled, in such a process, each event increases the probability of subsequent events, creating a feedback loop. This is a mathematical description of reality that naturally produces power-law distributions of cascade sizes (Didier Sornette 2006). The key takeaway is that the potential for massive, system-spanning events is an intrinsic property of our interconnected social structure. The system is always loaded with potential energy, waiting for a sufficient trigger.
2.2 Network Topology Creates Perceptual Distortions That Prime the System for Manipulation
We likely know that network (i.e., interpersonal connections) structure matters for humans; we rarely reason through what this means for social dynamics, however. Lerman’s work on the “Majority Illusion” demonstrates that due to network homophily and the variance in node degree, a belief held by a small minority can be perceived as a majority by most nodes in the network (Lerman, Yan, and Wu 2016). This is not necessarily cognitive bias per se; it is a mathematical artifact of the network’s topology (or, if we’d prefer, a cognitive bias towards treating correlated data as i.i.d). Adversarially, an AI with a god’s-eye view of the social graph can solve this as an optimization problem. It can identify the minimal set of nodes to influence to create a maximal Majority Illusion, effectively manufacturing the perception of a tipping point to bootstrap a real one.
2.3 If there is a “Persuasion Coefficient” it is Demonstrably Malleable by Non-Human Agents.
This is the new capability that compels us to update our models. For a long time, we could assume that the probability of persuading a human on a deeply held belief was low and constrained by human limitations (e.g., in-group/out-group psychology, time, patience). This assumption is now obsolete.
Costello, Pennycook, and Rand (2024) is, in my opinion, a large update for the likelihood of tail events for our threat model. It provides empirical evidence that an AI, by virtue of being a patient, non-judgmental, non-human entity, can bypass the standard identity-defence mechanisms that make human-to-human persuasion so difficult, even on politicized topics; i.e. superhuman capability in this area has already arrived.
2.4 Synthesis
Let’s formalize this with a very simple model. In a simple branching process model of a cascade (a special case of a Hawkes process), the system’s behaviour is governed by the reproduction number \(R_0\): the average number of new “infections” (i.e., belief adoptions) caused by a single infected individual.
For a network where each node has an average of \(k\) contacts (the average degree), and the probability of transmitting the belief during a single contact is p, the reproduction number is given by:
\[R_0 = kp\]
The system’s behaviour exhibits a critical threshold at \(R_0 = 1\):
- If \(R_0< 1\) (sub-critical): Any cascade is guaranteed to die out. The expected total size, E[S], of a cascade starting from a single node is finite and given by the formula: \(E[S] = 1 / (1 - R_0)\)
 - If \(R_0 > 1\) (super-critical): There is a non-zero probability, let’s call it \(\rho\), that the cascade will never die out, instead growing to encompass a macroscopic fraction of the network. This probability of an “infinite” cascade is \(\rho= 1 - q\), where \(q\) is the probability of the cascade’s eventual extinction. The value of \(q\) is the smallest non-negative solution to the equation \(q = G (q)\), where \(G(s)\) is the probability generating function of the offspring distribution.
 
My interpretation of the Costello et al. study is that AI persuasion provides a lever to directly manipulate the contact transmission probability, \(p\). The catastrophic error for a policymaker is to assume that the effect of changing \(p\) on the final outcome is linear. The mathematics of branching processes shows this is false in the most dramatic way possible.
Consider the behaviour near the critical point \(R_0\) = 1:
- If the system’s organic \(R_0=kp=0.95\), the expected cascade size is \(E[S] = 1 / (1–0.95)\) i.e. 20 people. The system is stable.
 - Now, imagine an AI persuasion campaign increases the transmission probability p by nearly 5%, pushing \(R_0\) to 0.999. The new expected cascade size is \(E[S] = 1 / (1–0.999)\), i.e. 1000 people. A tiny change in the input parameter has produced a 50× increase in the expected output.
 - If that same AI pushes \(p\) slightly further, so that \(R_0\) becomes 1.05, the system crosses the critical threshold. The expected size of any cascade is now infinite (the formula diverges), a qualitatively new phenomenon emerges: the non-zero probability \(\rho\) of a system-spanning, macroscopic cascade.
 
This small change doesn’t make cascades slightly bigger; it pushes the entire system across the critical threshold, leading to explosive, heavy-tailed outcomes. We have moved from a world where massive social shifts were rare “black swan” events to one where they can be engineered.
The new empirical results in AI persuasion demonstrate that the parameter \(p\) is tunable. This means the reproduction number \(R_0\) is now an “attack variable”. We are moving from a world where massive social shifts were rare, stochastic “black swan” events governed by a near-critical \(R_0\), to a world where an adversary can deterministically push the system into a super-critical state to engineer such an event.
Our current AI safety paradigm is insufficient because it is agent-centric. We are trying to align the AI, but we are ignoring the emergent instabilities of the substrate on which it will operate. Even a perfectly aligned, benevolent AI tasked with “reducing misinformation” could, by applying its superhuman persuasive abilities, inadvertently trigger a catastrophic cascade that destabilizes society.
3 Initial research agendas for epistemic security
The preceding analysis opens a vast research landscape. Key open questions include:
- The Topology-Threshold Problem: What is the precise functional relationship between the structural properties of a real-world social network (e.g., its degree distribution, community structure, homophily) and its critical threshold R₀ for cascades?
 - The Signal-in-Noise Problem: Can AI-catalyzed cascades be reliably detected against the backdrop of immense organic social noise? What are the minimal, sufficient statistical signals for early detection?
 - The Intervention Trade-off Problem: What are the second-order effects of any potential mitigation? How do we design interventions that dampen malign cascades without simultaneously crippling the spread of beneficial social movements or scientific truths?
 
We propose two initial, complementary research projects designed to produce tangible insights and build a foundation for the broader topic of epistemic security. These are not exhaustive, but they represent a credible starting point.
3.1 Causal Tomography of Persuasion
This project addresses a fundamental measurement problem: how do we formalize and detect the transfer of cognitive agency from a human to a persuasive AI? I have expanded this proposal out at length as A Theory and Measurement Framework for Detecting Agency and Manipulation in Human–AI Systems.
3.2 Adversarial Simulation for Protocol Design
This project tackles the problem of systemic resilience. Instead of focusing on the individual, it focuses on the environment, seeking to discover communication protocols that are inherently resistant to manipulation.
- Problem: We lack a principled way to design social networks that are robust against engineered cascades. The design space is too vast, and real-world experimentation is too dangerous.
 - Proposed Method: We will construct a high-fidelity agent-based model of a social network, creating a simulated “epistemic environment”. Within this simulation, we will use reinforcement learning to train two competing agents:
 
- An “Attacker” (Persuader AI): Its goal is to maximize the size and speed of belief cascades, using strategies like engineering Majority Illusions.
 - A “Defender” (Network Architect AI): Its goal is to minimize malign cascades without resorting to centralized content removal. Its action space is limited to subtle, protocol-level changes: altering feed algorithms to favour source diversity, introducing small amounts of friction for high-velocity content, or providing metadata about a meme’s novelty and origin.
 
- Concrete Objective: By having these agents compete over millions of simulations, we will not find a single “perfect” protocol. Instead, we will map the Pareto frontier of the design space, revealing the fundamental trade-offs between a network’s resilience to manipulation and its efficiency in spreading legitimate information. This provides a principled, evidence-based playbook for building safer information ecosystems.
 
4 Conclusion
The risk from AI persuasion has been systematically underestimated because we have failed to connect the dots between the latent, explosive potential of our social networks and the now-demonstrated ability of AI to act as a catalyst. This is no longer a problem of psychology; it is a problem of statistical physics. The integrity of our shared epistemic commons is a systemic safety issue. The projects outlined above represent concrete first steps toward the rigorous, quantitative science of epistemic security required to navigate the coming storm. We must begin the work of modeling and securing our collective intellect before an engineered cascade pushes us past a point of no return.
5 References
Footnotes
I wrote an older report on cascades in social change for a non-specialist audience, if we’d like a refresher↩︎
