Utility and evolutionary fitness
Wants versus needs, selection theorems
2025-06-05 — 2026-01-13
Wherein the relation between utility and evolutionary fitness is examined, and Malthusian (log) fitness is shown to function as an ‘as‑if’ utility, governing long‑run multiplicative lineage growth.
Fitness, in evolutionary biology, measures an organism’s expected reproductive success. Utility, in economics and decision theory, measures an agent’s preferences, i.e. it’s what agents seek.
We often blur the lines between what an organism wants and what it evolutionarily needs. Why do we love sugar? The standard explanation is that in ancestral environments, sweetness signalled calorie density, which aided survival and reproduction. Our preferences (our “utility” for sweetness) were shaped by the “fitness” benefit of calories.
This intuition runs deep. Evolutionary biologists often describe organisms as if they are maximizing fitness. Similarly, economists argue that competition forces firms to act as if they are maximizing profits, regardless of the managers’ actual intentions (Friedman 1953). There’s a whole field of evolutionary psychology that attempts to explain human desires as adaptations to ancestral environments. In genetic programming, we attempt to evolve programs that maximize a fitness function, effectively treating the search process as an optimization problem.
This “as-if” optimization is a neat heuristic, but it’s clearly not exact. My evolved taste for sugar, once adaptive, is now easily hijacked by modern junk food, leading to outcomes misaligned with my long-term health. The utility function that evolution built into me is no longer a perfect proxy for my fitness. So these notions can come apart.
So, how exactly do utility and fitness relate? When are they the same, and when do they diverge?
One answer could be Mesa-optimization (optimizers that create other optimizers) (Hubinger et al. 2021). I don’t go for that framing here, but I might come back to it.
1 Prelude: Natural selection as inference
Natural selection can be viewed as a kind of inference process (Harper 2010; Shalizi 2009). TBC.
2 Replicator equations and evolutionary processes in inference
Can we think of statistical inference as an evolutionary process, as in biology? See also evolution, game theory, utility-vs-fitness (./utility_fitness.qmd).
Gentle intro lecture by John Baez, Biology as Information Dynamics.
See (Baez 2011; Harper 2010; Shalizi 2009; Sinervo and Lively 1996).
3 Definitions
3.1 Utility: What an Agent Wants
In economics and decision theory, utility is a mathematical representation of preferences. If I prefer A to B, my utility function \(u\) assigns a higher number to A: \(u (A) > u (B)\).
The von Neumann-Morgenstern (VNM) framework deals with preferences under uncertainty (von Neumann and Morgenstern 1944). If an agent’s preferences follow certain axioms of rationality (like transitivity—if I prefer A to B and B to C, I must prefer A to C), then that agent acts as if they are maximizing their expected utility.
Utility is a measure of what an agent wants, as evinced by their choices.
3.2 Fitness: What Evolution Needs
In evolutionary biology, fitness measures an organism’s expected reproductive success. It determines which traits are likely to become more common over generations.
Let’s consider a vector of phenotypic traits, \(\mathbf{z}\) (e.g., beak size, running speed).
- Absolute Fitness (\(W(\mathbf{z})\)): The expected number of offspring an organism with traits \(\mathbf{z}\) will produce.
- Malthusian Fitness (\(m(\mathbf{z})\)): The natural logarithm of absolute fitness. \(m(\mathbf{z}) = \ln W(\mathbf{z})\).
This last definition, Malthusian fitness (or log-fitness), is the key to bridging the gap with utility.
4 The Local Alignment: Evolution as Gradient Ascent
Evolution favours traits that increase fitness. This suggests that populations are climbing a slope in a “fitness landscape”. When does this climbing process look like optimization?
4.1 The Machinery of Selection
Biologists measure how strongly selection acts on traits using the selection gradient (\(\boldsymbol{\beta}\)). This gradient is essentially the slope of the regression of relative fitness on the traits (Lande and Arnold 1983). It points in the direction where fitness increases most steeply within the current population.
The response to selection—how the average trait changes in the next generation—is given by the Multivariate Breeder’s Equation (Lande 1979):
\[ \Delta \bar{\mathbf{z}} = \mathbf{G}\boldsymbol{\beta} \]
Here, \(\Delta \bar{\mathbf{z}}\) is the change in the average phenotype. \(\mathbf{G}\) is the additive genetic covariance matrix. This matrix shows how traits are inherited together and constrains which directions evolution can take. The equation says evolution moves the average phenotype in the direction of \(\boldsymbol{\beta}\), but genetic constraints encoded in \(\mathbf{G}\) shape the path.
4.2 The “As-If” Equivalence
Now we can connect the selection gradient \(\boldsymbol{\beta}\) to the idea of a fitness landscape. Let’s look at Malthusian fitness \(m(\mathbf {z})\). We want to know the gradient of this landscape, \(\nabla m\), which points towards the steepest increase in log-fitness.
It turns out there’s a connection. In many real-world scenarios, trait variation within a population is small and the fitness landscape is smooth. We can approximate the landscape locally by a first-order Taylor expansion.
In this “log-linear local regime,” an interesting identity emerges (Orr 2007; Morrissey and Goudie 2022):
\[ \boldsymbol{\beta} \approx \nabla m \]
The selection gradient we estimate by regression is approximately equal to the gradient of the Malthusian fitness landscape. We substitute that back into the Breeder’s Equation:
\[ \Delta \bar{\mathbf{z}} \approx \mathbf{G}\nabla m \]
Cool. It turns out that the evolutionary response is a constrained gradient ascent on Malthusian fitness.
4.3 Local Equivalence
In this local regime, if we define an “as-if utility” \(u \equiv m\) (utility equals log-fitness), evolution behaves precisely as if it’s maximizing that utility function, subject to the constraints imposed by \(\mathbf {G}\) (A. Grafen 2007; Gardner 2009). The selection gradient (\(\boldsymbol{\beta}\)) corresponds exactly to the marginal utility of traits (\(\nabla u\)) to the organism.
This is why the analogy between fitness and utility feels so strong: locally, under common conditions, they’re mathematically equivalent.
Note, however, that we’ve been talking about traits here, not behaviours. We’ll come back to the latter soon.
5 The Long Game
Evolution operates over time in uncertain environments. This gives another, perhaps deeper, reason why Malthusian fitness (log-fitness) acts as the utility function evolution maximizes.
Evolution is a multiplicative process. If our lineage has two offspring in the first generation, and each of them has three offspring in the second, we have \(2 \times 3 = 6\) descendants.
If fitness varies because of environmental randomness, the long-term outcome depends on the geometric mean fitness, not the arithmetic mean. Because of compounding, a strategy with high variance is risky: a few bad generations can severely hamper long-term growth or even cause extinction.
Mathematically, maximizing the long-run growth rate of a lineage is equivalent to maximizing the expected value of the logarithm of fitness (Cohen 1966):
\[ \text{Maximize } \mathbb{E}[\ln W] \]
This is mathematically identical to maximizing expected utility where \(u = \ln W\). This concept aligns with the Kelly Criterion in finance and gambling (Kelly 1956; Breiman 1961). If we reinvest our winnings in a multiplicative gamble, the strategy that maximizes long-run wealth is the one that maximizes the expected logarithm of the return.
5.1 Evolutionary Risk Aversion
Because the logarithm function is concave (it curves downward), maximizing \(\mathbb {E}[\ln W]\) penalizes variance. This is called “bet-hedging” (Philippi and Seger 1989). Evolution is risk-averse not because of a psychological preference but because multiplicative growth demands it.
From this long-term perspective, utility is log-fitness — not merely a local approximation, but the fundamental objective function dictated by the dynamics of compounding growth under uncertainty.
6 This is somewhat fake because fitness landscapes aren’t real
People in machine learning theory are quick to believe in fitness landscapes because we spend all day thinking about loss landscapes. After due consideration, I must confess fitness landscapes seem even more fake than loss landscapes. Or at least, there isn’t a single fitness landscape shared by a genotype across most selection processes in the wild.
6.1 Frequency Dependence and Game Theory
The most significant breakdown occurs when fitness is frequency-dependent—that is, the success of our strategy depends on what everyone else is doing.
In this case, the fitness landscape is not static. As the population evolves, the landscape itself shifts. There is no single, static utility function that evolution maximizes. Instead, we must use the tools of Evolutionary Game Theory (Maynard Smith 1982). Evolution may cycle (like Rock-Paper-Scissors) or reach complex equilibria instead of climbing to a peak.
6.2 Evolutionary Mismatch and Proxy Goals
Evolution is slow. The utility functions encoded in our brains were shaped by ancestral environments. When the environment changes rapidly (as human environments have), these evolved utilities can become misaligned with current fitness. Our preference for sugar is a prime example of this mismatch.
Utility often tracks proxies for fitness—pleasure, status, comfort. These proxies can be hijacked by superstimuli, leading to behaviours that satisfy utility but decrease fitness.
6.3 Niche Construction and Feedback
If organisms actively modify their environment (niche construction), the fitness landscape becomes dynamic and path-dependent (Odling-Smee, Laland, and Feldman 2003). The environment we face today depends on the actions of our ancestors. This feedback loop complicates the idea of simple optimization, as the optimization target is constantly moving (Schulz 2014).
6.4 Multi-Level Selection
When selection acts at multiple levels (e.g., individuals within groups, and groups competing with other groups), things get weird. What maximizes individual fitness might decrease group fitness (e.g., tragedy of the commons). Defining a coherent “group utility” becomes highly problematic (Okasha and Okasha 2008; Okasha 2009) and is an ongoing battle in evolutionary biology.
But the biggest distinction, I think, is that we still haven’t thought enough about how agents might not only enact traits but also have behaviours; behaviours are the domain of utility theory.
7 “Selection Theorems”
The connection we’ve drawn between fitness and utility—where evolutionary pressure forces behaviour to align with maximizing log-fitness—is not a unique phenomenon. It’s one example of what we might imagine is a more general principle where Selection forces structure.
When a system—whether a biological organism, a firm in a market, or an AI algorithm—is strongly optimized to perform well according to some criterion, the winning systems tend to share certain internal organizations. If a specific architecture is required for optimal performance, only systems possessing that architecture will survive selection.
This looks more like an argument about utilities; it’s about how selection pressures might force systems to behave as if they are maximizing a utility function that’s usefully related to their fitness. This idea is formalized in what John Wentworth calls Selection Theorems.
I’m not a huge fan of the term “theorem” here, because these results are often not rigorous theorems but heuristic arguments or empirical observations. We’re stuck with it for now. These results follow a general template:
If a system is selected to perform well under a criterion \(\mathcal {C}\), then near-optimal elements must possess a certain internal structure or “type signature” \(\mathcal {T}\).
Let’s look at a classic example to see how this might work.
7.1 Example: The Necessity of Consistency (Coherence Theorems)
Now consider a different kind of pressure: surviving in a competitive environment where resources are at stake, such as a market or a betting scenario. The selection criterion here is simple: avoid being exploited into a guaranteed loss.
Agents can fail this criterion in two main ways: inconsistent preferences or inconsistent beliefs.
Inconsistent Preferences (The Money Pump): Imagine an agent who prefers Apples to Bananas (A>B), Bananas to Carrots (B>C), but also prefers Carrots to Apples (C>A). This agent can become a “money pump”. A trader could convince the agent to pay a small fee to trade a Carrot for a Banana (since B>C), then to pay a fee to trade the Banana for an Apple (A>B), and finally to pay a fee to trade the Apple back for a Carrot (C>A). The agent ends up back where they started, but poorer.
Inconsistent Beliefs (The Dutch Book): Suppose an agent has beliefs that violate the laws of probability. For example, the agent believes there’s a 60% chance Team X will win a game, and a 60% chance that Team X will lose that same game.
A sharp adversary can construct a Dutch Book against this agent. The adversary offers two bets, priced according to the agent’s beliefs:
- Bet 1: Pays $1 if Team X wins. Believing the win probability is 60%, the agent is willing to pay up to $0.60 for this bet.
- Bet 2: Pays $1 if Team X loses. The agent is also willing to pay up to $0.60 for this bet.
The adversary sells both bets to the agent for $0.60 each. The agent has spent a total of $1.20. However, only one outcome can occur (win or loss), so the agent can only win back $1.00. The agent has accepted a combination of bets that guarantees a net loss of $0.20, no matter the outcome.
7.2 Coherence result
If an agent wants to avoid guaranteed losses, its beliefs and preferences must be internally consistent (von Neumann and Morgenstern 1944).
- Selection Criterion (\(\mathcal{C}\)): Avoid being exploited for guaranteed losses (non-domination).
- Resulting Structure (\(\mathcal{T}\)): The agent must act as if it has a consistent probability distribution (its beliefs obey the laws of probability) and a consistent utility function (its preferences are transitive). In other words, it must behave as an Expected Utility Maximizer.
Such results don’t argue that agents “should” maximize utility for philosophical reasons. They argue that agents who don’t will be exploited, outcompeted, and ultimately eliminated.
Note, however, that these results say much less about how long we can get away with being irrational in this sense. TBC
