Goodhart’s Law

2019-12-22 — 2025-01-22

economics

game theory

incentive mechanisms

institutions

machine learning

optimization

statistics

utility

Suspiciously similar content

A concept that recurs in a lot of places; in the replication crisis, Benchmarking, evolutionary hyperselection, overfitting in statistics.

Goodhart’s law is an adage named after economist Charles Goodhart, which has been phrased by Marilyn Strathern as “When a measure becomes a target, it ceases to be a good measure.”

Goodhart first advanced the idea in a 1975 article, which later became used popularly to criticise the United Kingdom government of Margaret Thatcher for trying to conduct monetary policy on the basis of targets for broad and narrow money. His original formulation was:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

The verb form is fun, as in don’t goodhart yourself.

Possibly one could say equate goodharting with “hyperselection”.

If you are worried about choosing a metric that does what you want, perhaps you are trying to solve an alignment problem.

c.f. Campbell’s law.

c.f. fake production.

1 General mechanisms

Manheim and Garrabrant (2019):

There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart’s Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are “(at least) four different mechanisms” that relate to Goodhart’s Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimising the proxy, and so the increased optimisation power offered by artificial intelligence makes it especially critical for that field.

They mention various flavours of Goodhart’s law, including:

Regressional Goodhart:: When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal. This is also known as Tails come apart.
Extremal Goodhart:: Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the relationship between the proxy and the goal was observed. A form of this occurs in statistics and machine learning as “out of sample prediction.”
Causal Goodhart:: When the causal path between the proxy and the goal is indirect, intervening can change the relationship between the measure and proxy.
Adversarial Misalignment Goodhart:: The agent applies selection pressure knowing the regulator will apply different selection pressure on the basis of the metric.

Connection: Distribution shift and external validity.

2 People hack targets

One facet of Goodhart’s law is warning us not to forget that many learning problems are adversarial and we might want more robust targets than a single loss function, such as a game theoretic equilibrium. TBC.

3 Goodhart-Moloch

The Overfit Theory of Everything
Too much efficiency makes everything worse: overfitting and the strong version of Goodhart’s law

This same counterintuitive relationship between efficiency and outcome occurs in machine learning, where it is called overfitting. Overfitting is heavily studied, somewhat theoretically understood, and has well known mitigations. This connection between the strong version of Goodhart’s law in general, and overfitting in machine learning, provides a new lens for understanding bad outcomes, and new ideas for fixing them.

4 Connection to Benchmarks

See Benchmarks for a discussion of benchmarks in ML, wherein Goodhart constantly lurks.

5 Coming apart

Christiano argues:

We will try to harness this power by constructing proxies for what we care about, but over time those proxies will come apart:

Corporations will deliver value to consumers as measured by profit. Eventually this mostly means manipulating consumers, capturing regulators, extortion and theft.

Investors will “own” shares of increasingly profitable corporations, and will sometimes try to use their profits to affect the world. Eventually instead of actually having an impact they will be surrounded by advisors who manipulate them into thinking they’ve had an impact.

Law enforcement will drive down complaints and increase reported sense of security. Eventually this will be driven by creating a false sense of security, hiding information about law enforcement failures, suppressing complaints, and coercing and manipulating citizens.

Legislation may be optimised to seem like it is addressing real problems and helping constituents. Eventually that will be achieved by undermining our ability to actually perceive problems and constructing increasingly convincing narratives about where the world is going and what’s important.

cf Sarah Constantin’s manifesto on similar themes in What Goes Without Saying.

6 Incoming

Filip Piekniewski on the tendency to select bad target losses for convenience, which he analyses as a flavour of Goodhart’s law.
See also Measuring Goodhart’s Law at OpenAI.
Cedric Chin, Goodhart’s Law Isn’t as Useful as You Might Think
Joe Edelman, Is Anything Worth Maximising? How metrics shape markets, how we’re doing them wrong

Metrics are how an algorithm or an organization listens to you. If you want to listen to one person, you can just sit with them and see how they’re doing. If you want to listen to a whole city — a million people — you have to use metrics and analytics

and

What would it be like, if we could actually incentivize what we want out of life? If we incentivized lives well-lived.
Dan Luu, How do cars fare in crash tests they’re not specifically optimized for?

7 References

Hoel. 2021. “The Overfitted Brain: Dreams Evolved to Assist Generalization.” Patterns.

Koch, and Peterson. 2024. “From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution.”

Manheim, and Garrabrant. 2019. “Categorizing Variants of Goodhart’s Law.”

Solzhenit︠s︡yn. 2003. The Gulag Archipelago, 1918-56: An Experiment in Literary Investigation.

Welsh. 2024. “Tukhta: Labour and Resistance in the Audit Regime of the Soviet Gulag.” Labor History.