Performative prediction
2026-06-09 — 2026-06-14
Wherein a Credit Classifier Is Found to Cause the Defaults It Predicts, Performative Stability Is Distinguished From Optimality, and Strategic Manipulation of Features by Applicants Is Examined.
An ML formalization of hyperstition, probably implicit in recommender dynamics, adversarial classification and some types of external validity, among other places. HT TJ for the pointer.
In classical supervised learning we fit parameters \(\theta\) to minimize expected loss over a distribution \(\mathcal{D}\) that sits passively and lets itself be measured, \[\theta_{\mathrm{SL}} = \arg\min_\theta \mathbb{E}_{Z\sim\mathcal{D}}\,\ell(Z;\theta),\] with \(Z=(X,Y)\) a feature–outcome pair and \(\ell\) the loss.
Specifically, while we assume the model watches the world, we assume that the world does not watch back. Performative prediction (Perdomo et al. 2020) is a formalization of a different regime wherein deploying the model changes the subsequent data distribution.
The stock example (familiar from fairness) is assessing people for credit-worthiness. If we predict someone to be at high risk of credit default, we might protectively assign them a punishing interest rate, which in turn increases their propensity to default, thereby confirming our prediction.
We address this phenomenon formally by assuming that each choice of parameters produces a perturbed data distribution \(\mathcal{D}(\theta)\) — the data we would see if we deployed \(\theta\) to do things in the world. In the case that the perturbation ‘looks like a self-fulfilling prophecy’, we call that a hyperstition.
We define the performative risk, \[\mathrm{PR}(\theta) = \mathbb{E}_{Z\sim\mathcal{D}(\theta)}\,\ell(Z;\theta).\] Here \(\theta\) appears twice: once as the model being graded, once inside the distribution it produces. That second appearance clearly clashes with ordinary regression — we cannot simply descend the gradient of the loss in isolation, because moving \(\theta\) also moves the target.
There are two extensions to the definition which recover something like that classical supervised, static setting.
A performative optimum \(\theta_{\mathrm{PO}} = \arg\min_\theta \mathrm{PR}(\theta)\) minimises the performative risk with both copies of \(\theta\) moving together, via the implicit dependence of the distribution on the parameters. We derive this by applying the chain rule to the performative risk: \[\nabla_\theta \mathrm{PR}(\theta) = \underbrace{\mathbb{E}_{Z\sim\mathcal{D}(\theta)}\big[\nabla_\theta \ell(Z;\theta)\big]}_{\text{fit term}} \;+\; \underbrace{\nabla_\theta\,\mathbb{E}_{Z\sim\mathcal{D}(\theta)}\big[\ell(Z;\theta')\big]\Big|_{\theta'=\theta}}_{\text{reshaping term}}.\] The fit term is the gradient we would write down if the distribution were static. The reshaping term is the part new to performative prediction: it measures how perturbing \(\theta\) deforms the distribution \(\mathcal{D}(\theta)\) itself, and how much that deformation costs us in expected loss. The optimum is (defined to be?) where the two cancel — where the marginal gain from fitting the data better is exactly offset by the marginal cost it produces.
A performatively stable point \(\theta_{\mathrm{PS}}\) instead satisfies a fixed-point condition, \[\theta_{\mathrm{PS}} = \arg\min_\theta \mathbb{E}_{Z\sim\mathcal{D}(\theta_{\mathrm{PS}})}\,\ell(Z;\theta),\] i.e. we are satisfied with a \(\theta_{\mathrm{PS}}\) if, given an induced world, the model is already optimal for it, so refitting recovers the same parameters.
Apparently, the two do not coincide; the stable model is not in general the best one. Its special feature is that we suspect it can be found by hyperstitious iteration. Practitioners already talk about this as retraining: refit on whatever distribution the last model produced, \[\theta_{t+1} = \arg\min_\theta \mathbb{E}_{Z\sim\mathcal{D}(\theta_t)}\,\ell(Z;\theta).\]
Perdomo and co-authors call this repeated risk minimization and derive a contraction guarantee: if the loss is smooth and strongly convex, and the map \(\mathcal{D}(\cdot)\) is sufficiently Lipschitz in Wasserstein distance, then the iteration converges to a stable point at a linear rate, the error shrinking by a constant factor each step. Put another way, “does the hyperstition converge or explode?” turns upon the question of how sensitive the world is to the model.
1 Strategic classification
Everything above is agnostic about why the distribution moves; \(\mathcal{D}(\theta)\) is just a map we are handed, and the contraction result cares only about how Lipschitz it is, not where it came from. Strategic classification (Hardt et al. 2016) supplies a specific generative story — historically one of the special cases that performative prediction was built to generalize (Hardt and Mendler-Dünner 2025). Here the distribution moves because the people being classified are gaming the classifier.
The setup is a Stackelberg game. The institution publishes a classifier \(\theta\) — it moves first and commits in the open — and each agent, seeing the rule, shifts their own features to land on the favourable side of it. Movement is not free: a cost function \(c(x, x')\) charges the agent for presenting features \(x'\) in place of their “true” \(x\). So each agent best-responds, trading the score \(s_\theta(x')\) the classifier would assign the presented features against what the manipulation costs them: \[\Delta_\theta(x) = \arg\max_{x'}\big[\, s_\theta(x') - c(x, x') \,\big].\] The induced distribution \(\mathcal{D}(\theta)\) is then the base distribution pushed forward through this best-response map. This is performativity with economic microfoundations: we can say exactly how perturbing \(\theta\) deforms the data, because we have written down the optimisation each agent solves against it.
In the basic version the features move but the labels do not. An agent dresses up their \(x\) to clear the credit-worthy bar without becoming any more creditworthy underneath — pure adversarial Goodhart, the measure prised loose from the thing it was meant to measure. There exist richer variants that relax this. If perturbing some feature actually moves the outcome — studying for the exam rather than buying the answers — then the manipulation is improvement rather than gaming, and we might design \(\theta\) precisely so that the cheap moves are the improving ones. Whether a given classifier rewards gaming or improvement is one of the things this formalism is designed to resolve.
Miller, Milli, and Hardt (2019) observes that drawing this distinction at all drags us out of pure prediction and into causal inference. Whether nudging a feature games the classifier or improves the outcome depends on whether that feature causes \(Y\) or is merely correlated with it — so an institution that wants to reward improvement has to know the causal graph relating features to the outcome, not just the predictive correlations a classifier happily eats. Read this way, a classifier that incentivizes improvement is a designed intervention on the population. Pinning down which levers a decision rule pulls — and so what behaviour it rewards — is a job for the causal-influence-diagram account of incentives. That is strictly harder than fitting \(\theta\) — it inherits every identifiability headache of causal inference — which is one reason the breezy “just retrain” story from a few paragraphs back might leave us uneasy: naïve retraining will happily converge to a classifier that everyone games and no one improves under.
Revisiting the performative prediction, what have we now? The reshaping term of the performative gradient — abstract a few paragraphs ago — is now concretely the derivative of the agents’ best response \(\Delta_\theta\), measuring how much harder everyone games when we move the boundary. A performatively stable classifier is one that stays optimal against the gamed distribution it provokes: we have already priced in how people will respond, so refitting buys us nothing. The catch is that \(\Delta_\theta\) includes an \(\arg\max\), so that reshaping term need not be at all well-behaved; Levanon and Rosenfeld (2021) get the chain rule to run in practice by swapping the hard best-response for a smooth surrogate, which makes the objective differentiable end-to-end. If we never get to see the cost function at all, only the gamed features that come back, we are in the revealed-preferences regime of Dong et al. (2018) — learning the game while we are playing it.
There are some interesting extensions. When the classified agents do not act in isolation but rather coordinate, we are in algorithmic collective action (Hardt et al. 2024). In this regime, even a small collective can steer what the model learns — the mirror image of the platform’s own performative power (Hardt, Jagadeesan, and Mendler-Dünner 2022), the reach it has to steer them back. When we want to quantify whether a principal can instead keep them from coordinating, that is divide-and-rule — the coalition game read from the principal’s side.
2 Seeing like a model
Step back far enough and the whole performative setup is what happens when a measuring instrument is also an instrument of government. This is the territory James C. Scott maps in Seeing Like a State, which I pick over in legibility and automation and metis and modernity: to run a forest, a city, or a workforce at scale, the state first has to make it legible, flattening its teeming specificities into gridded categories a central office can read at a glance. That act of flattening might, in these modern times, manifest as a deployed classifier \(\theta\). Indeed, a governed population does not persist unchanged beneath the gaze of the classifiers of the state, and we might expect both performative power and strategic classification to be in play.
Scott’s photogenic disaster is the Normalbaum of scientific forestry: a forest replanted as tidy rows of legible timber, a triumph for one rotation and an ecological collapse by the second. Board-feet and the forest settle into a self-consistent fixed point — performatively stable, and a catastrophe — the gap between stable and best from earlier, played out over a century. The collapse is, in Scott’s terms, metis reasserting itself — the embodied know-how the grid threw away turning out to have been load-bearing — and it comes back two ways. Where the classified can act, as the credit applicant or the exam-sitter can, it returns as strategic gaming: the dropped knowledge routing around the metric, Goodhart again. Where they cannot, as a forest cannot, it returns as omitted causal structure: the discarded variables reasserting themselves through the outcome, with no agent in the loop. The Normalbaum is the second kind; a human under a bureaucracy gets both at once.
Strategic classification lets agents move within the categories while holding the categories themselves fixed. Loosen that and the fight migrates to the boundary itself — who draws the line, and where — because, once resources, status and power flow through a category its definition becomes a contested thing. Gender, obscenity, creditworthiness, refugee status: the deployed model is one more party trying to pin a boundary that everyone else is leaning on, and where the boundary lands is itself performative.
