Goodhart’s Law

December 22, 2019 — April 30, 2024

economics

game theory

incentive mechanisms

institutions

machine learning

optimization

statistics

utility

A concept that recurs in a lot of places; in the replication crisis, Benchmarking, evolutionary hyperselection, overfitting in statistics.

Goodhart’s law is an adage named after economist Charles Goodhart, which has been phrased by Marilyn Strathern as “When a measure becomes a target, it ceases to be a good measure.”

Goodhart first advanced the idea in a 1975 article, which later became used popularly to criticise the United Kingdom government of Margaret Thatcher for trying to conduct monetary policy on the basis of targets for broad and narrow money. His original formulation was:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

The verb form is fun, as in don’t goodhart yourself.

Possibly one could say goodharting is “hyperselection” and that would mean the same thing.

If you are worried about choosing a metric that does what you want, perhaps you are trying to solve an alignment problem.

c.f. Campbell’s law.

c.f. туфта (tʊfˈta) (Solzhenit︠s︡yn 2003; Welsh 2024).

1 General mechanisms

Manheim and Garrabrant (2019):

There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart’s Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are “(at least) four different mechanisms” that relate to Goodhart’s Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimising the proxy, and so the increased optimisation power offered by artificial intelligence makes it especially critical for that field.

They mention various flavours of Goodhart’s law, including:

Regressional Goodhart: When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal. This is also known as Tails come apart.
Extremal Goodhart: Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the relationship between the proxy and the goal was observed. A form of this occurs in statistics and machine learning as “out of sample prediction.”
Causal Goodhart: When the causal path between the proxy and the goal is indirect, intervening can change the relationship between the measure and proxy.
Adversarial Misalignment Goodhart: The agent applies selection pressure knowing the regulator will apply different selection pressure on the basis of the metric

Connection: Distribution shift and external validity.

2 People hack targets

One facet of Goodhart’s law is warning us not to forget that many learning problems are adversarial and we might want more robust targets than a single loss function, such as a game theoretic equilibrium. TBC.

3 Goodhart-Moloch

The Overfit Theory of Everything
Too much efficiency makes everything worse: overfitting and the strong version of Goodhart’s law

This same counterintuitive relationship between efficiency and outcome occurs in machine learning, where it is called overfitting. Overfitting is heavily studied, somewhat theoretically understood, and has well known mitigations. This connection between the strong version of Goodhart’s law in general, and overfitting in machine learning, provides a new lens for understanding bad outcomes, and new ideas for fixing them.

4 Incoming

Filip Piekniewski on the tendency to select bad target losses for convenience, which he analyses as a flavour of Goodhart’s law. See also Measuring Goodhart’s Law at OpenAI.

Cedric Chin, Goodhart’s Law Isn’t as Useful as You Might Think
Joe Edelman, Is Anything Worth Maximising? How metrics shape markets, how we’re doing them wrong

Metrics are how an algorithm or an organization listens to you. If you want to listen to one person, you can just sit with them and see how they’re doing. If you want to listen to a whole city — a million people — you have to use metrics and analytics

and

What would it be like, if we could actually incentivize what we want out of life? If we incentivized lives well lived.
Dan Luu, How do cars fare in crash tests they’re not specifically optimized for?

5 References

Hoel. 2021. “The Overfitted Brain: Dreams Evolved to Assist Generalization.” Patterns.

Koch, and Peterson. 2024. “From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution.”

Manheim, and Garrabrant. 2019. “Categorizing Variants of Goodhart’s Law.”

Solzhenit︠s︡yn. 2003. The Gulag Archipelago, 1918-56: An Experiment in Literary Investigation.

Welsh. 2024. “Tukhta: Labour and Resistance in the Audit Regime of the Soviet Gulag.” Labor History.