Learning with theory of mind
What collective learning looks like from the individual agent’s perspective
May 3, 2025 — May 5, 2025
Suspiciously similar content
Learning agents in a multi-agent system which account for and/or exploit the fact that other agents are learning too. This is one way of formalising the idea of theory of mind.
Learning with theory of mind works out nicely for reinforcement learning, in e.g. opponent shaping, and may be an important tool for understanding AI agency and AI alignment, as well as aligning more general human systems. Other interesting things might arise from a good theory of other-aware learning, such as extra ideas about solving collective action problems, incentive mechanisms, iterated game theory, and even what causes a “self” to be a meaningful unit of analysis.
I do not think this is likely to be a sufficient explanation of agentic cognition. This seems more like something useful for formalising local dynamics for a system in a regular configuration, such as a market or a personal relationship. Does it help us formalise the open-system fuzzy-boundaries dynamics?
1 Asymmetric: Learning to make your opponent learn
I was first switched on to this idea in the asymmatric form by Dezfouli, Nock, and Dayan (2020), which describes a way to learn to make your opponent learn.
The symmetric form, where we are in the same learning loop, is also interesting.
2 Opponent shaping
Opponent shaping is a reinforcement learning-meets-iterated game theory formalism in which agents influence each other by using models of the other agents.
I’m particularly interested in this and have made it its own notebook.
3 Assistance games
a.k.a. Cooperative inverse reinforcement learning (Hadfield-Menell et al. 2016). This is another asymmetric one. I just learned about these from AssistanceZero (Laidlaw et al. 2025):
Assistance games are a promising alternative to reinforcement learning from human feedback (RLHF) for training AI assistants. Assistance games resolve key drawbacks of RLHF, such as incentives for deceptive behaviour, by explicitly modelling the interaction between assistant and user as a two-player game where the assistant cannot observe their shared goal.
It sounds like a parametric prediction of human goals on the manifold of coherent ones.
See also the explicitly multiplayer version (Fickinger et al. 2020).
4 Basic
With theory of belief but without a theory of learning in opponents:
Today we are unveiling Recursive Belief-based Learning (ReBeL), a general RL+Search algorithm that can work in all two-player zero-sum games, including imperfect-information games. ReBeL builds on the RL+Search algorithms like AlphaZero that have proved successful in perfect-information games. Unlike those previous AIs, however, ReBeL makes decisions by factoring in the probability distribution of different beliefs each player might have about the current state of the game, which we call a public belief state (PBS). In other words, ReBeL can assess the chances that its poker opponent thinks it has, for example, a pair of aces.
By accounting for the beliefs of each player, ReBeL is able to treat imperfect-information games akin to perfect-information games. ReBeL can then leverage a modified RL+Search algorithm that we developed to work with the more complex (higher-dimensional) state and action space of imperfect-information games.
5 Incoming
- 大トロ, Collective Intelligence for Deep Learning: A Survey of Recent Developments
- Yoav Shoham and Kevin Leyton-Brown’s textbook, Multiagent Systems Algorithmic, Game-Theoretic, and Logical Foundations is downloadable
- Distributed information processing in biological and computational systems
- Yujian, Digest: Consensus Filters is a quick intro to signal analysis using consensus filters
- Artificial Communication: How Algorithms Produce Social Intelligence