Multi-Agent Reinforcement Learning

Distributed sensing, swarm sensing, adaptive social learning, multi-agent adaptation, iterated game theory with learning etc

2014-10-13 — 2026-05-27

Wherein the Formation of Agent Coalitions Is Treated as a Learning Problem, Classical Exponential Complexity Is Noted, and Cooperative Game Theory Concepts Including the Nucleolus Are Adopted.

agents

AI safety

bounded compute

collective knowledge

computers are awful together

distributed

economics

edge computing

extended self

game theory

incentive mechanisms

machine learning

networks

Placeholder for notes on multi-agent reinforcement learning (MARL).

MARL is a big topic, which I cannot hope to introduce here, but there are some sub-topics I might get around to. At the moment, that is mostly coalitional MARL.

1 Classic Collectives

COINs etc. TBD.

2 Coalitional MARL

See algorithmic collective action.

3 Opponent shaping

Works on pairwise coalitions. See opponent shaping. Still pretty cool.

4 Multi-agent IRL

Inferring reward functions from demonstrations, generalised to many interacting agents. Yu, Song, and Ermon (2019) (MA-AIRL) extend adversarial IRL to Markov games, recovering per-agent rewards from expert play.

5 Open-source game theory

MARL is one possible operationalisation of open-source game theory, in which agents exchange policy source code (or some interpretable proxy) before acting and cooperate by mutual verification.

6 References

Albrecht, Christianos, and Schäfer. 2024. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches.

Amato. 2024. “An Introduction to Centralized Training for Decentralized Execution in Cooperative Multi-Agent Reinforcement Learning.”

Bachrach, Everett, Hughes, et al. 2020. “Negotiating Team Formation Using Deep Reinforcement Learning.” Artificial Intelligence.

Baker. 2020. “Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences.” In.

Bárász, Christiano, Fallenstein, et al. 2014. “Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic.”

Bieniawski, and Wolpert. 2004. “Adaptive, Distributed Control of Constrained Multi-Agent Systems.” In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 3.

Cao, Lazaridou, Lanctot, et al. 2018. “Emergent Communication Through Negotiation.”

Chalkiadakis. 2007. “A Bayesian Approach to Multiagent Reinforcement Learning and Coalition Formation Under Uncertainty.”

Chalkiadakis, and Boutilier. 2003. “Coordination in Multiagent Reinforcement Learning.” In.

Cooper, Oesterheld, and Conitzer. 2024. “Characterising Simulation-Based Program Equilibria.”

Critch. 2016. “Parametric Bounded Löb’s Theorem and Robust Cooperation of Bounded Agents.”

———. 2017. “Toward Negotiable Reinforcement Learning: Shifting Priorities in Pareto Optimal Sequential Decision-Making.”

Dong, Li, Yang, et al. 2024. “Egoism, Utilitarianism and Egalitarianism in Multi-Agent Reinforcement Learning.” Neural Networks.

Du, and Ding. 2021. “A Survey on Multi-Agent Deep Reinforcement Learning: From the Perspective of Challenges and Applications.” Artificial Intelligence Review.

Duque, Aghajohari, Cooijmans, et al. 2025. “Advantage Alignment Algorithms.” In.

Fickinger, Zhuang, Hadfield-Menell, et al. 2020. “Multi-Principal Assistance Games.”

Foerster, J. 2018. “Deep Multi-Agent Reinforcement Learning.”

Foerster, Jakob, Chen, Al-Shedivat, et al. 2018. “Learning with Opponent-Learning Awareness.” In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS ’18.

Foerster, Jakob, Farquhar, Afouras, et al. 2018. “Counterfactual Multi-Agent Policy Gradients.” Proceedings of the AAAI Conference on Artificial Intelligence.

Franzmeyer, Malinowski, and Henriques. 2021. “Learning Altruistic Behaviours in Reinforcement Learning Without External Rewards.” In.

Gronauer, and Diepold. 2022. “Multi-Agent Deep Reinforcement Learning: A Survey.” Artificial Intelligence Review.

Hadfield-Menell, Dragan, Abbeel, et al. 2016. “Cooperative Inverse Reinforcement Learning.” In Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16.

Ha, and Tang. 2022. “Collective Intelligence for Deep Learning: A Survey of Recent Developments.” Collective Intelligence.

Havrylov, and Titov. 2017. “Emergence of Language with Multi-Agent Games: Learning to Communicate with Sequences of Symbols.”

Hernandez-Leal, Kartal, and Taylor. 2019. “A Survey and Critique of Multiagent Deep Reinforcement Learning.” Autonomous Agents and Multi-Agent Systems.

Jaques, Lazaridou, Hughes, et al. 2019. “Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning.” In Proceedings of the 36th International Conference on Machine Learning.

Jiang, Su, and Lu. 2024. “Fully Decentralized Cooperative Multi-Agent Reinforcement Learning: A Survey.”

Laidlaw, Bronstein, Guo, et al. 2025. “AssistanceZero: Scalably Solving Assistance Games.” In Workshop on Bidirectional Human↔︎AI Alignment.

Lee, Leibo, An, et al. 2022. “Importance of Prefrontal Meta Control in Human-Like Reinforcement Learning.” Frontiers in Computational Neuroscience.

Li, Cao, Qiao, et al. 2025. “Nucleolus Credit Assignment for Effective Coalitions in Multi-Agent Reinforcement Learning.” In.

Lin, Zhu, Li, et al. 2025. “Policy-Conditioned Policies for Multi-Agent Task Solving.”

Lowe, Foerster, Boureau, et al. 2019. “On the Pitfalls of Measuring Emergent Communication.”

Lowe, Wu, Tamar, et al. 2020. “Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.”

Mak, Xu, Pearce, et al. 2023. “Fair Collaborative Vehicle Routing: A Deep Multi-Agent Reinforcement Learning Approach.”

Meulemans, Kobayashi, Oswald, et al. 2024. “Multi-Agent Cooperation Through Learning-Aware Policy Gradients.” In.

Meulemans, Nasser, Wołczyk, et al. 2025. “Embedded Universal Predictive Intelligence: A Coherent Framework for Multi-Agent Learning.”

Oguntola, Campbell, Stepputtis, et al. 2023. “Theory of Mind as Intrinsic Motivation for Multi-Agent Reinforcement Learning.” arXiv.org.

Ohsawa. 2021. “Unbiased Self-Play.” arXiv:2106.03007 [Cs, Econ, Stat].

Oroojlooy, and Hajinezhad. 2023. “A Review of Cooperative Multi-Agent Deep Reinforcement Learning.” Applied Intelligence.

Pant, and Yu. 2026. “Coopetition-Gym V1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning Under Strategic Coopetition.”

Peysakhovich, and Lerer. 2017. “Prosocial Learning Agents Solve Generalized Stag Hunts Better Than Selfish Ones.”

Rădulescu. 2021. “Decision Making in Multi-Objective Multi-Agent Systems A Utility-Based Perspective.”

Sharma, Fernandez, Zaroukian, et al. 2021. “Survey of Recent Multi-Agent Reinforcement Learning Algorithms Utilizing Centralized Training.” In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III.

Sistla, and Kleiman-Weiner. 2025. “Evaluating LLMs in Open-Source Games.”

Suarez. 2024. “Neural MMO: Massively Multiagent Simulation and Learning.”

Suárez, Isola, Choe, et al. 2023. “Neural MMO 2.0: A Massively Multi-Task Addition to Massively Multi-Agent Learning.”

Tennant, Hailes, and Musolesi. 2023. “Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning.” In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence.

Tumer, and Wolpert. 2012. Collectives and the Design of Complex Systems.

Weis, Wołczyk, Nasser, et al. 2026. “Multi-Agent Cooperation Through in-Context Co-Player Inference.”

Wolpert, David H. 2006a. “Advances in Distributed Optimization Using Probability Collectives.” Advances in Complex Systems.

———. 2006b. “Information Theory — The Bridge Connecting Bounded Rational Game Theory and Statistical Physics.” In Complex Engineered Systems. Understanding Complex Systems.

Wolpert, David H, Bieniawski, and Rajnarayan. 2011. “Probability Collectives in Optimization.”

Wolpert, David H, and Lawson. 2002. “Designing Agent Collectives for Systems with Markovian Dynamics.” In.

Wolpert, David H., and Tumer. 1999. “An Introduction to Collective Intelligence.” arXiv:cs/9908014.

Wolpert, David H, Wheeler, and Tumer. 1999. “General Principles of Learning-Based Multi-Agent Systems.” In.

———. 2000. “Collective Intelligence for Control of Distributed Dynamical Systems.” EPL (Europhysics Letters).

Wulfmeier, Ondruska, and Posner. 2016. “Maximum Entropy Deep Inverse Reinforcement Learning.”

Xiong, Zhang, Cui, et al. 2023. “Coalition Game of Radar Network for Multitarget Tracking via Model-Based Multiagent Reinforcement Learning.” IEEE Transactions on Aerospace and Electronic Systems.

Yang, Luo, Li, et al. 2018. “Mean Field Multi-Agent Reinforcement Learning.” In Proceedings of the 35th International Conference on Machine Learning.

Yu, Song, and Ermon. 2019. “Multi-Agent Adversarial Inverse Reinforcement Learning.” In.