A concept that recurs in a lot of places; replication crisis, gender in sport, Benchmarking, evolutionary hyperselection, overfitting in statistics.
Goodhart’s law is an adage named after economist Charles Goodhart, which has been phrased by Marilyn Strathern as “When a measure becomes a target, it ceases to be a good measure.”
Goodhart first advanced the idea in a 1975 article, which later became used popularly to criticize the United Kingdom government of Margaret Thatcher for trying to conduct monetary policy on the basis of targets for broad and narrow money. His original formulation was:
Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.
The verb form is fun, as in don’t goodhart yourself. Or possibly one could say to hyperselect.
c.f. Campbell’s law.
Manheim and Garrabrant (2019):
There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart’s Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are “(at least) four different mechanisms” that relate to Goodhart’s Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimization power offered by artificial intelligence makes it especially critical for that field.
They mention various flavours of Goodhart’s law, including:
- Regressional Goodhart
- When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal. This is also known as “Tails come apart.
- Extremal Goodhart
- Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the relationship between the proxy and the goal was observed. A form of this occurs occurs in statistics and machine learning as “out of sample prediction.”
- Causal Goodhart
- When the causal path between the proxy and the goal is indirect, intervening can change the relationship between the measure and proxy.
- Adversarial Misalignment Goodhart
- The agent applies selection pressure knowing the regulator will apply different selection pressure on the basis of the metric
People hack targets
One facet of Goodhardt’s law is warning us not to forget that many learning problems are adversarial and we might want more robust targets than a single loss function, such as a game theoretic equilibrium.
Filip Piekniewski on the tendency to select bad target losses for convenience, which he analyses as a flavour of Goodhart’s law. Measuring Goodhart’s Law at OpenAI. For more of that see Too much efficiency makes everything worse: overfitting and the strong version of Goodhart’s law
This same counterintuitive relationship between efficiency and outcome occurs in machine learning, where it is called overfitting. Overfitting is heavily studied, somewhat theoretically understood, and has well known mitigations. This connection between the strong version of Goodhart's law in general, and overfitting in machine learning, provides a new lens for understanding bad outcomes, and new ideas for fixing them.
- Cedric Chin, Goodhart’s Law Isn’t as Useful as You Might Think
No comments yet. Why not leave one?