Goodhart’s Law

December 22, 2019 — April 30, 2024

game theory
incentive mechanisms
machine learning
Figure 1

A concept that recurs in a lot of places; in the replication crisis, Benchmarking, evolutionary hyperselection, overfitting in statistics.

Goodhart’s law is an adage named after economist Charles Goodhart, which has been phrased by Marilyn Strathern as “When a measure becomes a target, it ceases to be a good measure.”

Goodhart first advanced the idea in a 1975 article, which later became used popularly to criticize the United Kingdom government of Margaret Thatcher for trying to conduct monetary policy on the basis of targets for broad and narrow money. His original formulation was:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

The verb form is fun, as in don’t goodhart yourself.

Possibly one could say “hyperselect” and that would mean the same thing.

c.f. Campbell’s law.

1 General mechanisms

Manheim and Garrabrant (2019):

There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart’s Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are “(at least) four different mechanisms” that relate to Goodhart’s Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimization power offered by artificial intelligence makes it especially critical for that field.

They mention various flavours of Goodhart’s law, including:

Regressional Goodhart
When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal. This is also known as Tails come apart.
Extremal Goodhart
Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the relationship between the proxy and the goal was observed. A form of this occurs in statistics and machine learning as “out of sample prediction.”
Causal Goodhart
When the causal path between the proxy and the goal is indirect, intervening can change the relationship between the measure and proxy.
Adversarial Misalignment Goodhart
The agent applies selection pressure knowing the regulator will apply different selection pressure on the basis of the metric

Connection: Distribution shift and external validity.

2 People hack targets

One facet of Goodhardt’s law is warning us not to forget that many learning problems are adversarial and we might want more robust targets than a single loss function, such as a game theoretic equilibrium. TBC.

3 Goodhart-Moloch

4 Incoming

Filip Piekniewski on the tendency to select bad target losses for convenience, which he analyses as a flavour of Goodhart’s law. See also Measuring Goodhart’s Law at OpenAI.

5 References