Learning with theory of mind

What collective learning looks like from the individual agent’s perspective

May 3, 2025 — May 5, 2025

adaptive
agents
bandit problems
bounded compute
collective knowledge
control
cooperation
distributed
economics
evolution
extended self
game theory
incentive mechanisms
learning
machine learning
mind
networks
swarm
utility
Figure 1

Learning agents in a multi-agent system which account for and/or exploit the fact that other agents are learning too. This is one way of formalising the idea of theory of mind.

Learning with theory of mind works out nicely for reinforcement learning, in e.g. opponent shaping, and may be an important tool for understanding AI agency and AI alignment, as well as aligning more general human systems. Other interesting things might arise from a good theory of other-aware learning, such as extra ideas about solving collective action problems, incentive mechanisms, iterated game theory, and even what causes a “self” to be a meaningful unit of analysis.

I do not think this is likely to be a sufficient explanation of agentic cognition. This seems more like something useful for formalising local dynamics for a system in a regular configuration, such as a market or a personal relationship. Does it help us formalise the open-system fuzzy-boundaries dynamics?

1 Asymmetric: Learning to make your opponent learn

I was first switched on to this idea in the asymmatric form by Dezfouli, Nock, and Dayan (2020), which describes a way to learn to make your opponent learn.

The symmetric form, where we are in the same learning loop, is also interesting.

2 Opponent shaping

Opponent shaping is a reinforcement learning-meets-iterated game theory formalism in which agents influence each other by using models of the other agents.

I’m particularly interested in this and have made it its own notebook.

3 Assistance games

a.k.a. Cooperative inverse reinforcement learning (Hadfield-Menell et al. 2016). This is another asymmetric one. I just learned about these from AssistanceZero (Laidlaw et al. 2025):

Assistance games are a promising alternative to reinforcement learning from human feedback (RLHF) for training AI assistants. Assistance games resolve key drawbacks of RLHF, such as incentives for deceptive behaviour, by explicitly modelling the interaction between assistant and user as a two-player game where the assistant cannot observe their shared goal.

It sounds like a parametric prediction of human goals on the manifold of coherent ones.

See also the explicitly multiplayer version (Fickinger et al. 2020).

4 Basic

With theory of belief but without a theory of learning in opponents:

Today we are unveiling Recursive Belief-based Learning (ReBeL), a general RL+Search algorithm that can work in all two-player zero-sum games, including imperfect-information games. ReBeL builds on the RL+Search algorithms like AlphaZero that have proved successful in perfect-information games. Unlike those previous AIs, however, ReBeL makes decisions by factoring in the probability distribution of different beliefs each player might have about the current state of the game, which we call a public belief state (PBS). In other words, ReBeL can assess the chances that its poker opponent thinks it has, for example, a pair of aces.

By accounting for the beliefs of each player, ReBeL is able to treat imperfect-information games akin to perfect-information games. ReBeL can then leverage a modified RL+Search algorithm that we developed to work with the more complex (higher-dimensional) state and action space of imperfect-information games.

5 Incoming

6 References

Aghajohari, Duque, Cooijmans, et al. 2023. LOQA: Learning with Opponent Q-Learning Awareness.” In.
Balaguer, Koster, Summerfield, et al. 2022. The Good Shepherd: An Oracle Agent for Mechanism Design.” In.
Conitzer, and Oesterheld. 2023. Foundations of Cooperative AI.” Proceedings of the AAAI Conference on Artificial Intelligence.
Cooijmans, Aghajohari, and Courville. 2023. Meta-Value Learning: A General Framework for Learning with Learning Awareness.”
Critch, Dennis, and Russell. 2022. Cooperative and Uncooperative Institution Designs: Surprises and Problems in Open-Source Game Theory.”
Dafoe, Hughes, Bachrach, et al. 2020. Open Problems in Cooperative AI.”
Deng, Papadimitriou, and Safra. 2002. On the Complexity of Equilibria.” In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing. STOC ’02.
Dezfouli, Nock, and Dayan. 2020. Adversarial Vulnerabilities of Human Decision-Making.” Proceedings of the National Academy of Sciences.
Dong, Li, Yang, et al. 2024. Egoism, Utilitarianism and Egalitarianism in Multi-Agent Reinforcement Learning.” Neural Networks.
Duque, Aghajohari, Cooijmans, et al. 2024. Advantage Alignment Algorithms.” In.
Fickinger, Zhuang, Hadfield-Menell, et al. 2020. Multi-Principal Assistance Games.”
Foerster, Chen, Al-Shedivat, et al. 2018. Learning with Opponent-Learning Awareness.” In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS ’18.
Foerster, Farquhar, Al-Shedivat, et al. 2018. DiCE: The Infinitely Differentiable Monte-Carlo Estimator.”
Hadfield-Menell, Dragan, Abbeel, et al. 2016. “Cooperative Inverse Reinforcement Learning.” In Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16.
Hadfield-Menell, and Hadfield. 2018. Incomplete Contracting and AI Alignment.”
Khan, Willi, Kwan, et al. 2024. Scaling Opponent Shaping to High Dimensional Games.” In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems. AAMAS ’24.
Laidlaw, Bronstein, Guo, et al. 2025. AssistanceZero: Scalably Solving Assistance Games.” In Workshop on Bidirectional Human↔︎AI Alignment.
Leibo, Zambaldi, Lanctot, et al. 2017. Multi-Agent Reinforcement Learning in Sequential Social Dilemmas.”
Levin. 2019. The Computational Boundary of a ‘Self’: Developmental Bioelectricity Drives Multicellularity and Scale-Free Cognition.” Frontiers in Psychology.
Lu, Willi, Witt, et al. 2022. Model-Free Opponent Shaping.” In Proceedings of the 39th International Conference on Machine Learning.
Lyons, and Levin. 2024. Cognitive Glues Are Shared Models of Relative Scarcities: The Economics of Collective Intelligence.”
Meulemans, Kobayashi, Oswald, et al. 2024. Multi-Agent Cooperation Through Learning-Aware Policy Gradients.” In.
Sharma, Davidson, Khetarpal, et al. 2024. Toward Human-AI Alignment in Large-Scale Multi-Player Games.”
Smith, and Krishnamurthy. 2011. Symmetry and Collective Fluctuations in Evolutionary Games:
———. 2015. Symmetry and Collective Fluctuations in Evolutionary Games:
Tarai, and Bit, eds. 2021. Neurocognitive Perspectives of Prosocial and Positive Emotional Behaviours: Theory to Application.
Willi, Letcher, Treutlein, et al. 2022. COLA: Consistent Learning with Opponent-Learning Awareness.” In Proceedings of the 39th International Conference on Machine Learning.
Xie, Losey, Tolsma, et al. 2021. Learning Latent Representations to Influence Multi-Agent Interaction.” In Proceedings of the 2020 Conference on Robot Learning.