Game theory

2016-10-13 — 2026-04-29

Wherein mutual defection is shown to constitute the unique Nash equilibrium of the Prisoner’s Dilemma, despite cooperation being Pareto-superior for all parties concerned.

bounded compute
cooperation
economics
game theory
incentive mechanisms
mind

I have nothing to say about foundational game theory itself, except to note that JD Williams’ book, The Compleat Strategyst (Williams 1966) is online for free, so you should grab it.

How long until we approach Nash equilibrium also includes a note on Aumann’s correlated equilibrium, which I’d like to learn more about.

Figure 1

The classical formalisms below are mostly full-information games — players know who the others are, what their payoffs are, what the rules are, and they all maximise utility rationally. Real strategic interaction involves persuasion, signalling, deception, framing effects, learning over time, fluid coalition boundaries, and players who often do not know the rules of the game they are playing. The classical theory captures one slice — which arrangements are stable, given common-knowledge payoffs — but is not, on its own, a complete model of any non-trivial real situation. See beyond classical game theory below for some directions people have taken to fix this.

1 Non-cooperative game theory

The non-cooperative branch studies what happens when players cannot make binding agreements: each chooses a strategy independently, and the question is which equilibria arise. The label is a misnomer in one direction — much of the most interesting content here is about how cooperative behaviour emerges from selfish play.

1.1 Prisoner’s dilemma

When people talk about game theory they usually mean the class of mathematically formulated two-player “games,” which are typically not the fun type of games, being much shorter and more brutal than Carcassonne or whatever.

The most famous one is the Prisoner’s dilemma; you’ve probably run into this one. Alice and Bob, co-conspirators, have been arrested by the cops for a crime they did commit, and they are interviewed separately. The cops offer them each the same choice: “Inform on your buddy and we will let you off lightly.” Obviously, they want to spend as little time in prison as possible; what should they each do?

There are four possible outcomes:

  1. Both defect — Alice and Bob both turn informant: They both go to prison for 10 years
  2. Bob defects — Bob informs and Alice stays schtum: Alice goes to prison for 10 years, Bob walks free in 1 year
  3. Alice defects — Bob stays schtum and Alice informs: Bob goes to prison for 10 years, Alice walks free in 1 year
  4. Both cooperate — Alice and Bob both stay schtum: They each go to prison for 2 years on a lesser charge

Label each player’s two actions \(C\) (cooperate) and \(D\) (defect), and let payoffs be:

Bob: C Bob: D
Alice: C (R,R) (S,T)
Alice: D (T,S) (P,P)

where the usual ordering is \[ T > R > P > S,\quad 2R > T + S. \] Commonly we set \[ T=5,\;R=3,\;P=1,\;S=0. \]

A player’s best response to a fixed opponent strategy is the action that maximises her payoff given what the other does.

  • If Bob plays \(C\), Alice’s payoffs are \[ \begin{cases} \text{play }C\mapsto R=3,\\ \text{play }D\mapsto T=5, \end{cases} \] so Alice’s best response is \(D\).

  • If Bob plays \(D\), Alice’s payoffs are \[ \begin{cases} \text{play }C\mapsto S=0,\\ \text{play }D\mapsto P=1, \end{cases} \] so Alice’s best response is again \(D\).

Because \(D\) strictly dominates \(C\) (it’s best regardless of the other’s move), defecting is each player’s unique best response.

A Nash equilibrium is a profile of strategies where each player is playing a best response to the others.

  • Here, since both players’ best response is always \(D\), \((D,D)\) is the unique Nash equilibrium.

An outcome is Pareto‐efficient (PE) if there’s no alternative outcome that makes at least one player strictly better off without making anyone worse off.

  • Compare the four outcomes of PD:

    1. \((C,C)\) yields \((3,3)\).
    2. \((D,D)\) yields \((1,1)\).
    3. \((D,C)\) yields \((5,0)\) (asymmetric).
    4. \((C,D)\) yields \((0,5)\).
  • \((C,C)\) is Pareto‐efficient: we can’t move to another outcome that raises one player’s payoff without dropping the other’s below 3.

  • \((D,D)\) is not Pareto‐efficient: both players could jointly switch to \((C,C)\) and each move from 1→3, so \((C,C)\) Pareto‐dominates \((D,D)\).

  • Dominance ⇒ Defection: Because \(D\) is each player’s best response to anything, rational play leads to \((D,D)\).

  • Pareto efficiency ⇒ Cooperation: In terms of group welfare, \((C,C)\) is strictly better for everybody than \((D,D)\).

This gap—individual incentives pushing towards \((D,D)\) despite a better mutual outcome at \((C,C)\)—is the essence of what we often call a social dilemma.

This is a normal-form game, where the players choose their actions simultaneously and independently. Sequential and partially-observed games are more complicated, and we handle them in extensive form. Those pop up in e.g. causal agents.

1.2 Iterated games

Iterated games are a class of games where the players play the same game multiple times, and can use the results of previous rounds to inform their decisions in later rounds. These parallel many interesting dynamics in the real world. See Iterated and evolutionary game theory for a more detailed discussion of iterated games.

1.3 Stochastic games

In standard game theory, mixed strategies involve probabilistic choices over pure strategies. Stochastic games extend this by incorporating state transitions that evolve over time.

Stochastic games combine game theory with Markov decision processes. Players make decisions in a sequence of states, with each action affecting both immediate payoffs and the probability distribution of future states. Unlike standard mixed strategies where randomisation occurs only over action choices, stochastic games include:

  1. Multiple states that change over time
  2. Transition probabilities between states
  3. State-dependent payoff structures
  4. Potentially infinite horizons

In this framework, optimal strategies must account not only for immediate rewards but also for how actions influence the game’s future trajectory. This makes stochastic games particularly suitable for modelling economic competition, resource management, and multi-agent reinforcement learning scenarios where the environment changes in response to players’ actions.

2 Cooperative game theory

The cooperative branch sets aside the strategic detail of who plays what and works one level up: it asks “when and how should players agree to cooperate”. We take the players’ ability to make binding agreements as given, and ask which coalitions will form and how the value they create will be divided among them.

A (transferable-utility, “TU”) coalitional game is a finite player set \(N=\{1,\dots,n\}\) together with a characteristic function \[ v : 2^{N} \to \mathbb{R},\qquad v(\emptyset)=0, \] where \(v(S)\) is the worth that coalition \(S \subseteq N\) can secure on its own, regardless of what the players in \(N\setminus S\) do. The non-cooperative game from which \(v\) might be derived is left implicit; the modelling commitment is that we already know \(v\) and want to reason about outcomes, not strategies. A common further assumption is superadditivity\(v(S \cup T) \geq v(S) + v(T)\) for disjoint \(S,T\) — which says merging coalitions never hurts, and so the grand coalition \(N\) has at least as much to allocate as any partition of it.

The question shifts from “which strategy profiles are equilibria?” to “which payoff vectors \(x \in \mathbb{R}^{N}\) are stable allocations of \(v(N)\) amongst the members of \(N\)?” An imputation is a payoff vector that is efficient, \(\sum_{i\in N} x_i = v(N)\), and individually rational, \(x_i \geq v(\{i\})\) for each \(i\).

2.1 The core

The core is the set of imputations no coalition can profitably block: \[ \mathrm{core}(v) = \Bigl\{\, x \in \mathbb{R}^{N} \;\Big|\; \sum_{i \in N} x_i = v(N),\ \sum_{i \in S} x_i \geq v(S)\ \text{for all } S \subseteq N \,\Bigr\}. \]

A core allocation is one in which no subgroup could secede and do strictly better on its own. The core can be empty. The canonical example is three-player majority voting: each player alone is worth nothing, \(v(\{i\}) = 0\); any pair is worth one, \(v(\{i,j\}) = 1\); and the grand coalition is also worth only one, \(v(N) = 1\). Any allocation \((x_1, x_2, x_3)\) summing to \(1\) must give some pair a total less than \(1\), and that pair could profitably break off. No allocation is stable.

When does this not happen? A simple sufficient condition: the game is convex (Lloyd S. Shapley 1971) when synergies accumulate — adding a player to a larger coalition is at least as valuable as adding her to a smaller one. Concretely, the marginal contribution \(v(S \cup \{i\}) - v(S)\) is non-decreasing in \(S\). Compare: in the majority-voting game above, any pair already locks in the prize, so the third player contributes \(0\) on top of a pair but \(1\) on top of a single player — marginals shrink with coalition size, and the core is empty. In a wall-building game where each player alone produces nothing, any pair builds \(1\) wall, and all three together build \(3\), player 3 contributes \(0\) to nobody, \(1\) to a single partner, and \(2\) to a pair: marginals grow, the game is convex, and the egalitarian allocation \((1,1,1)\) is in the core.

Convex games always have non-empty cores, and the Shapley value of a convex game lies in the core. (The “convex” name is by analogy with real-valued convex functions, whose first differences are also non-decreasing in the same way.)

The nucleolus (Schmeidler 1969) is another single-valued solution. For any allocation \(x\), each coalition \(S\) has an excess \(v(S) - \sum_{i \in S} x_i\): how much \(S\) falls short of what it could secure on its own. The nucleolus picks \(x\) to minimise the largest excess; among allocations that achieve this, to minimise the next-largest; and so on. It exists even when the core is empty, and sits inside the core when the core is non-empty.

2.2 Shapley value

The platonic ideal of a “fair” allocation rule, \(\varphi(v) \in \mathbb{R}^{N}\) — each player gets their average marginal contribution across all orderings of \(N\) (Lloyd S. Shapley 1953). This is in some sense a natural notion that recurs across cooperative games proper, voting power, feature attribution in ML, and cost allocation. See Shapley value

2.3 Beyond TU

The TU setup assumes every coalition’s worth is a single number — that there is a common currency (money, points, utility every player values at par) and players can freely shift it amongst themselves. That is a strong assumption.

When utilities are not freely transferable, a coalition no longer has a single worth; it has a menu of possible outcomes for its members. Two friends planning a joint holiday can go to Paris, Tokyo, or the beach — each option gives a different utility pair, and there is no way to “transfer Paris-utility” from one friend to the other. So instead of a single number \(v(S)\), we attach to each coalition \(S\) a set \(V(S)\) of feasible utility vectors: the joint outcomes the members of \(S\) can secure together.

The core and the Shapley value have analogues in this setting, but the definitions now have to talk about sets rather than single numbers, existence guarantees get weaker, and my head starts to hurt.

See also coalition games and collective action.

3 Bargaining and commitment

Bargaining is a bridge between the cooperative and non-cooperative branches. The Nash bargaining solution is a cooperative-theory concept (axiomatic, in the spirit of Shapley); Rubinstein’s alternating-offers model recovers a similar outcome as the equilibrium of a non-cooperative game. The “Nash program” — deriving cooperative solutions as equilibria of non-cooperative games — sounds like something I would like to know about but I do not yet. See commitment for related discussion.

4 Beyond classical game theory

Two ways people have tried to fix the gap between the classical theory and real strategic interaction. Plus one further question, internal to classical theory: are the equilibria it promises actually computable?

Computational complexity / algorithmic game theory (T. Roughgarden 2018): in general, no. Finding a Nash equilibrium of a general two-player game is PPAD-complete (Daskalakis–Goldberg–Papadimitriou 2009), a complexity class for which we have no known efficient algorithms. Even when Nash’s existence theorem guarantees an equilibrium, no rational agent could plausibly compute it in finite time on a non-trivial game. This motivates several of the fixes that follow — bounded rationality (since real agents cannot compute equilibria) and learning-based approaches like MARL (which find approximate equilibria by interaction rather than analysis). See game complexity for the dedicated treatment.

4.1 Generalisations within game theory

Bayesian games (Harsanyi 1967) drop common knowledge of payoffs. Each player has a private type drawn from a distribution that everyone knows, and the analysis works at the level of types. Captures e.g. negotiations where one side doesn’t know the other’s reservation price.

Signalling games (Spence 1973) formalise costly communication that reveals private information. Spence’s job-market example: education does not directly create productivity, but bears a different cost for high-versus-low-productivity workers, so completing it credibly signals productivity. The interesting feature is the separating equilibrium, where the signal becomes informative because only one type can afford to send it.

Cheap talk (Crawford and Sobel 1982) is signalling without the cost: pre-play messages that are free to send. Crawford and Sobel show that such messages can convey information, but only when sender and receiver’s interests are aligned closely enough.

Bayesian persuasion (Kamenica and Gentzkow 2011) reverses the direction of the persuasion problem. The sender commits to a signal structure — an experiment whose outcome will become public — before observing the underlying state. By choosing the experiment cleverly, the sender can shape the receiver’s posterior beliefs to her advantage, even though she cannot lie about the outcome. This is the cleanest formal model of persuasion-as-information-design that I know of.

Causal multi-agent models add an explicit causal structure to extensive-form games: who observes what, who can intervene on what, and how payoffs are computed from the underlying graph. The modern unification is Multi-Agent Influence Diagrams — MAIDs, and their mechanised generalisation MMAIDs (Hammond et al. 2023; Fox et al. 2023) — which let one reason about agent incentives in counterfactual terms: what would each agent choose if the causal structure of the game were slightly different? Applications include AI safety analysis. See multi-agent causality for the dedicated treatment.

Behavioural game theory (Camerer 2003) drops the rational-maximiser assumption. Real players think a finite number of steps ahead (level-\(k\) models), best-respond noisily (quantal response), update via simple learning rules, and are sensitive to framing etc. The book is empirically anchored in lab experiments where the classical theory makes sharp predictions and real players regularly violate them.

Evolutionary game theory drops rational players entirely, working with population-level replicator dynamics. See iterated and evolutionary game theory.

4.2 Things not usually called “game theory” but solve the same problems

These approaches solve problems game theory tries to solve, but with methods game theorists do not traditionally claim.

Multi-Agent Reinforcement Learning (MARL) (Zhang, Yang, and Başar 2021): agents learn through repeated interaction with the environment and with each other, often without ever computing the payoff structure analytically. The headline results in StarCraft (AlphaStar, 2019) and Dota 2 (OpenAI Five, 2019) are MARL; cooperative MARL handles team play in increasingly complex settings, and there is a growing literature on MARL for autonomous driving and robotic coordination. Under particular limits — vanishing exploration, infinite play, perfect recall — MARL converges to game-theoretic equilibria; in practice it sidesteps the analytical solution concept and simply finds what works.

Agent-based simulation (Schelling 1971; Bonabeau 2002) builds populations of heterogeneous bounded agents with simple rules and watches what emerges. Schelling’s segregation model is the canonical instance: each agent prefers some fraction of similar neighbours, and even mild preferences for similarity drive the population to sharp segregation in simulation. Equilibrium is not assumed; if the dynamics converge, that is an output, not an input. Useful for systems where the game is too complex, too non-stationary, or too poorly specified for closed-form analysis, but there are so many degrees of freedom here that I don’t want to say much about what this shows.

Large language model agents (Park et al. 2023; Akata et al. 2023; Meta Fundamental AI Research Diplomacy Team (FAIR) et al. 2022) are a recent and weird corner: language models running multi-agent simulations of strategic interaction, with the strategies emerging from the model’s text-generation behaviour rather than from explicit utility maximisation. Park et al.’s “Smallville” — 25 LLM-driven characters in a simulated town, who coordinated a Valentine’s Day party without any explicit “throw-a-party” objective — is one well-known instance. Meta’s CICERO went the other direction, combining a language model with explicit strategic reasoning to play Diplomacy at human-expert level, negotiating alliances and betraying them in natural language. Whether any of this counts as game theory, behavioural simulation, or something else entirely is unsettled.

What these approaches have in common is that they handle the messy parts the classical theory has to assume away: incomplete specification of the game, non-stationary preferences, learning over time, ambiguous coalition boundaries. The price they pay is that closed-form characterisation is mostly out of reach, and most of what we know comes from empirical results on specific games.

5 References

Akata, Schulz, Coda-Forno, et al. 2023. Playing Repeated Games with Large Language Models.”
Alchian. 1950. “Uncertainty, Evolution, and Economic Theory.” The Journal of Political Economy.
Arthur. 1994. “Inductive Reasoning and Bounded Rationality: The El Farol Problem.” American Economic Review.
Aumann, Robert J. 1974. “Subjectivity and Correlation in Randomized Strategies.” Journal of Mathematical Economics.
Aumann, Robert J. 1985. An Axiomatization of the Non-Transferable Utility Value.” Econometrica.
Axelrod. 1984. The evolution of cooperation.
Bednar, and Page. 2000. “Can Game(s) Theory Explain Culture? The Emergence of Cultural Behavior Within Multiple Games.”
Blume. 1993. The Statistical Mechanics of Strategic Interaction.” Games and Economic Behavior.
Bonabeau. 2002. Agent-Based Modeling: Methods and Techniques for Simulating Human Systems.” Proceedings of the National Academy of Sciences.
Brockhurst, Buckling, and Gardner. 2007. “Cooperation Peaks at Intermediate Disturbance.” Current Biology.
Cai, Daskalakis, and Weinberg. 2013. Understanding Incentives: Mechanism Design Becomes Algorithm Design.” arXiv:1305.4002 [Cs].
Camerer. 2003. Behavioral Game Theory: Experiments in Strategic Interaction.
Castellano, Fortunato, and Loreto. 2009. Statistical Physics of Social Dynamics.” Reviews of Modern Physics.
Cesa-Bianchi, and Lugosi. 2006. Prediction, Learning, and Games.
Chaitin. 1977. “Algorithmic Information Theory.” IBM Journal of Research and Development.
Crawford, and Sobel. 1982. Strategic Information Transmission.” Econometrica: Journal of the Econometric Society.
Daskalakis, Deckelbaum, and Tzamos. 2012a. Optimal Pricing Is Hard.” In Internet and Network Economics.
———. 2012b. The Complexity of Optimal Mechanism Design.” arXiv:1211.1703 [Cs].
———. 2013. Mechanism Design via Optimal Transport.” In.
Durlauf. 1996. “Statistical Mechanics Approaches to Socioeconomic Behavior.”
Fosco, and Mengel. 2010. Cooperation Through Imitation and Exclusion in Networks.” Journal of Economic Dynamics and Control.
Foster, and Young. 2006. “Regret Testing: Learning to Play Nash Equilibrium Without Knowing You Have an Opponent.” Theoretical Economics.
Fox, MacDermott, Hammond, et al. 2023. On Imperfect Recall in Multi-Agent Influence Diagrams.” Electronic Proceedings in Theoretical Computer Science.
Galla, and Farmer. 2011. Complex Dynamics in Learning Complicated Games.” Complex Dynamics in Learning Complicated Games.
Gammerman. 2004. Algorithmic Learning in a Random World.
Gammerman, and Vovk. 2007. Hedging Predictions in Machine Learning.” The Computer Journal.
Greenblatt, Shlegeris, Sachan, et al. 2024. AI Control: Improving Safety Despite Intentional Subversion.”
Hammond, Fox, Everitt, et al. 2023. Reasoning about Causality in Games.” Artificial Intelligence.
Harsanyi. 1963. A Simplified Bargaining Model for the n-Person Cooperative Game.” International Economic Review.
———. 1967. Games with Incomplete Information Played by ``Bayesian’’ Players, I-III. Part I. The Basic Model.” Management Science.
Hetzer, and Sornette. 2013. An Evolutionary Model of Cooperation, Fairness and Altruistic Punishment in Public Good Games.” PLoS ONE.
Hirshleifer. 1991. “The Technology of Conflict as an Economic Activity.” The American Economic Review.
———. 1995. Anarchy and Its Breakdown.” Journal of Political Economy.
Insua, Rios, and Banks. 2009. Adversarial Risk Analysis.” Journal of the American Statistical Association.
Jackson, Matthew O. 2008. Social and Economic Networks.
Jackson, Matthew O. 2011. A Brief Introduction to the Basics of Game Theory.” SSRN Electronic Journal.
Kamenica, and Gentzkow. 2011. Bayesian Persuasion.” American Economic Review.
Koller, and Milch. 2003. Multi-Agent Influence Diagrams for Representing and Solving Games.” Games and Economic Behavior, First World Congress of the Game Theory Society,.
Latek, Axtell, and Kaminski. 2009. “Bounded Rationality via Recursion.” In.
Lazaric, and Raybaut. 2004. “Knowledge Creation Facing Hierarchy: The Dynamics of Groups Inside the Firm.” Journal of Artificial Societies and Social Simulation.
Le, and Boyd. 2007. “Evolutionary Dynamics of the Continuous Iterated Prisoner’s Dilemma.” Journal of Theoretical Biology.
Linial. 1994. Game-Theoretic Aspects of Computing.” In Handbook of Game Theory with Economic Applications.
McElreath, and Boyd. 2007. Mathematical Models of Social Evolution: A Guide for the Perplexed.
Mesquita. 2010. The Predictioneer’s Game: Using the Logic of Brazen Self-Interest to See and Shape the Future.
Meta Fundamental AI Research Diplomacy Team (FAIR), Bakhtin, Brown, et al. 2022. Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning.” Science.
Moral Sentiments and Material Interests: The Foundations of Cooperation in Economic Life. 2006.
Nowak, and Krakauer. 1999. “The Evolution of Language.” Proceedings of the National Academy of Sciences of the United States of America.
Nowak, Plotkin, and Krakauer. 1999. “The Evolutionary Language Game.” Journal of Theoretical Biology.
Ohsawa. 2021. Unbiased Self-Play.” arXiv:2106.03007 [Cs, Econ, Stat].
Ostrom. 1990. Governing the Commons: The Evolution of Institutions for Collective Action (Political Economy of Institutions and Decisions).
Park, O’Brien, Cai, et al. 2023. Generative Agents: Interactive Simulacra of Human Behavior.” In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. UIST ’23.
Pluchino, Rapisarda, and Garofalo. 2010. The Peter Principle Revisited: A Computational Study.” Physica A: Statistical Mechanics and Its Applications.
Rennard. 2006. Handbook of Research on Nature-Inspired Computing for Economics and Management.
Richards. 2001. Coordination and Shared Mental Models.” American Journal of Political Science.
Roca, Cuesta, and Sánchez. 2006. Time Scales in Evolutionary Dynamics.” Physical Review Letters.
Roughgarden, Tim. 2018. Complexity Theory, Game Theory, and Economics.” arXiv:1801.00734 [Cs, Econ].
Roughgarden, Joan, Oishi, and Akçay. 2006. “Reproductive Social Behavior: Cooperative Games to Replace Sexual Selection.” Science.
Rubinstein. 2000. Economics and Language.
Sadrieh. 1998. The Alternating Double Auction Market: A Game Theoretic and Experimental Investigation (Lecture Notes in Economics and Mathematical Systems).
Sanders, Galla, and Shapiro. 2011. Effects of Noise on Convergent Game Learning Dynamics.” arXiv:1109.4853.
Sato, Akiyama, and Farmer. 2002. Chaos in Learning a Simple Two-Person Game.” Proceedings of the National Academy of Sciences.
Sato, and Crutchfield. 2003. “Coupled Replicator Equations for the Dynamics of Learning in Multiagent Systems.” Physical Review E.
Schelling. 1971. Dynamic Models of Segregation.” The Journal of Mathematical Sociology.
Schmeidler. 1969. The Nucleolus of a Characteristic Function Game.” SIAM J. Appl. Math.
Schotter. 2008. The Economic Theory of Social Institutions.
Sethi, and Somanathan. 1996. The Evolution of Social Norms in Common Property Resource Use.” The American Economic Review.
Shafer, and Vovk. 2001. “Introduction: Probability and Finance as a Game.” In Probability and Finance: It’s Only a Game!
———. 2008. A Tutorial on Conformal Prediction.” Journal of Machine Learning Research.
Shapley, Lloyd S. 1953. “A Value for n-Person Games.” In Contributions to the Theory of Games, Volume II. Annals of Mathematics Studies.
———. 1967. On Balanced Sets and Cores.” Naval Research Logistics Quarterly.
———. 1971. Cores of Convex Games.” International Journal of Game Theory.
Shapley, L. S., and Shubik. 1954. A Method for Evaluating the Distribution of Power in a Committee System.” American Political Science Review.
Slobodkin, and Rapoport. 1974. “An Optimal Strategy of Evolution.” The Quarterly Review of Biology.
Spence. 1973. Job Market Signaling.” The Quarterly Journal of Economics.
———. 2002. Signaling in Retrospect and the Informational Structure of Markets.” American Economic Review.
Tooby, Cosmides, and Price. 2006. Cognitive Adaptations for n-Person Exchange: The Evolutionary Roots of Organizational Behavior.” Managerial and Decision Economics.
Vincent. 2006. “Carcinogenesis As an Evolutionary Game.” Advances in Complex Systems.
Vovk, Nouretdinov, and Gammerman. 2009. On-Line Predictive Linear Regression.” The Annals of Statistics.
Williams. 1966. The Compleat Strategyst : Being a Primer on the Theory of Games of Strategy.
Wolpert, Harré, Olbrich, et al. 2010. “Hysteresis Effects of Changing Parameters of Noncooperative Games.” SSRN eLibrary.
Wu, Altrock, Wang, et al. 2010. “Universality of Weak Selection.”
Yanagita, and Onozaki. 2008. Dynamics of a Market with Heterogeneous Learning Agents.” Journal of Economic Interaction and Coordination.
Yang, Lin, Wu, et al. 2011. Topological Conditions of Scale-Free Networks for Cooperation to Evolve.” arXiv:1106.5386.
Young. 1996. The Economics of Convention.” The Journal of Economic Perspectives.
———. 1998a. Conventional Contracts.” The Review of Economic Studies.
———. 1998b. Individual Strategy and Social Structure : An Evolutionary Theory of Institutions.
———. 1998c. “Social Norms and Economic Welfare.” European Economic Review.
———. 2002. “The Diffusion of Innovations in Social Networks.”
———. 2005. The Spread of Innovations Through Social Learning.”
———. 2006. “Social Dynamics: Theory and Applications.” Handbook of Computational Economics.
Zhang, Yang, and Başar. 2021. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms.” In Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control.