Reinforcement learning

Here’s an intro to all of machine learning through a historical tale about how one particular attempt to teach a machine (not a computer!) to play tic-tac-toe:

Without reward

Ringstrom (2022)

Deep reinforcement learning

Of course, artificial neural networks are a thing in this domain too.

See Andrej Karpathy’s explanation.

Casual concrete example and intro by Mat Kelcey.

The trick is you approximate the action table in Q-learning using a neural net.

Multi agent

With theory of mind.

today we are unveiling Recursive Belief-based Learning (ReBeL), a general RL+Search algorithm that can work in all two-player zero-sum games, including imperfect-information games. ReBeL builds on the RL+Search algorithms like AlphaZero that have proved successful in perfect-information games. Unlike those previous AIs, however, ReBeL makes decisions by factoring in the probability distribution of different beliefs each player might have about the current state of the game, which we call a public belief state (PBS). In other words, ReBeL can assess the chances that its poker opponent thinks it has, for example, a pair of aces.

By accounting for the beliefs of each player, ReBeL is able to treat imperfect-information games akin to perfect-information games. ReBeL can then leverage a modified RL+Search algorithm that we developed to work with the more complex (higher-dimensional) state and action space of imperfect-information games.


Algorithms for Decision Making: Decision making, in the sense of reinforcement learning

This book provides a broad introduction to algorithms for decision making under uncertainty. We cover a wide variety of topics related to decision making, introducing the underlying mathematical problem formulations and the algorithms for solving them.

Includes much of interest, including multi-agent learning.


Ajay, Anurag, Yilun Du, Abhi Gupta, Joshua Tenenbaum, Tommi Jaakkola, and Pulkit Agrawal. 2023. Is Conditional Generative Modeling All You Need for Decision-Making? In. arXiv.
Bensoussan, Alain, Yiqun Li, Dinh Phan Cao Nguyen, Minh-Binh Tran, Sheung Chi Phillip Yam, and Xiang Zhou. 2020. Machine Learning and Control Theory.” arXiv:2006.05604 [Cs, Math, Stat], June.
Brockman, Greg, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym.” arXiv:1606.01540 [Cs], June.
Clifton, Jesse, and Eric Laber. 2020. Q-Learning: Theory and Applications.” Annual Review of Statistics and Its Application 7 (1): 279–301.
Dayan, Peter, and Christopher JCH Watkins. n.d. “Reinforcement Learning.” In Encyclopedia of Cognitve Science.
Drori, Iddo. 2022a. “Deep Reinforcement Learning.” In The Science of Deep Learning, by Iddo Drori. Cambridge University Press.
———. 2022b. “Reinforcement Learning.” In The Science of Deep Learning, by Iddo Drori. Cambridge University Press.
———. 2022c. The Science of Deep Learning. Cambridge University Press.
Jaakkola, Tommi, Satinder P. Singh, and Michael I. Jordan. 1995. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems.” In Advances in Neural Information Processing Systems, 345–52.
Kaelbling, L. P., M. L. Littman, and A. W. Moore. 1996. Reinforcement Learning: A Survey.” Journal of Artifical Intelligence Research 4 (April).
Krakovsky, Marina. 2016. Reinforcement Renaissance.” Commun. ACM 59 (8): 12–14.
Krishnamurthy, Akshay, Alekh Agarwal, and John Langford. 2016. Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations.” arXiv:1602.02722 [Cs, Stat], February.
Lehman, Joel, Jonathan Gordon, Shawn Jain, Kamal Ndousse, Cathy Yeh, and Kenneth O. Stanley. 2022. Evolution Through Large Models.” arXiv.
Levine, Sergey. 2018. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review.” arXiv:1805.00909 [Cs, Stat], May.
Mania, Horia, Aurelia Guy, and Benjamin Recht. 2018. Simple Random Search Provides a Competitive Approach to Reinforcement Learning.” arXiv:1803.07055 [Cs, Math, Stat], March.
Mukherjee, Amartya, and Jun Liu. 2023. Bridging Physics-Informed Neural Networks with Reinforcement Learning: Hamilton-Jacobi-Bellman Proximal Policy Optimization (HJBPPO).” arXiv.
Parisotto, Emilio, and Ruslan Salakhutdinov. 2017. Neural Map: Structured Memory for Deep Reinforcement Learning.” arXiv:1702.08360 [Cs], February.
Pfau, David, and Oriol Vinyals. 2016. Connecting Generative Adversarial Networks and Actor-Critic Methods.” arXiv:1610.01945 [Cs, Stat], October.
Ren, Tongzheng, Tianjun Zhang, Lisa Lee, Joseph E. Gonzalez, Dale Schuurmans, and Bo Dai. 2023. Spectral Decomposition Representation for Reinforcement Learning.” arXiv.
Ringstrom, Thomas J. 2022. Reward Is Not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning.” arXiv.
Salimans, Tim, Jonathan Ho, Xi Chen, and Ilya Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning.” arXiv:1703.03864 [Cs, Stat], March.
Shibata, Takeshi, Ryo Yoshinaka, and Takashi Chikayama. 2006. Probabilistic Generalization of Simple Grammars and Its Application to Reinforcement Learning.” In Algorithmic Learning Theory, edited by José L. Balcázar, Philip M. Long, and Frank Stephan, 348–62. Lecture Notes in Computer Science 4264. Springer Berlin Heidelberg.
Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton. 2021. Reward Is Enough.” Artificial Intelligence 299 (October): 103535.
Sutton, Richard S, and Andrew G Barto. 1998. Reinforcement Learning. Cambridge, Mass.: MIT Press.
Sutton, Richard S., David A. McAllester, Satinder P. Singh, and Yishay Mansour. 2000. Policy Gradient Methods for Reinforcement Learning with Function Approximation.” In Advances in Neural Information Processing Systems, 1057–63.
Thrun, Sebastian B. 1992. Efficient Exploration In Reinforcement Learning.”

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.