# Reinforcement learning

November 28, 2014 — July 17, 2023

Here’s an intro to all of machine learning through a historical tale about how one particular attempt to teach a machine (not a computer!) to play tic-tac-toe:

## 1 Theory

The internet loves David Silver’s course.

Sutton & Barto Book: Reinforcement Learning: An Introduction

The ageing but gentle intro resource, AI-depot’s Reinforcement learning page.

Ben Eysenbach and Aviral Kumar and Abhishek Gupta, Reinforcement learning is supervised learning on optimized data

### 1.1 Practice

## 2 Without reward

Ringstrom (2022)

## 3 Via diffusion

Is Conditional Generative Modeling all you need for Decision-Making? (Ajay et al. 2023)

## 4 Deep reinforcement learning

Of course, artificial neural networks are a thing in this domain too.

See Andrej Karpathy’s explanation.

Casual concrete example and intro by Mat Kelcey.

The trick is you approximate the action table in Q-learning using a neural net.

## 5 Multi agent

With theory of mind.

today we are unveiling Recursive Belief-based Learning (ReBeL), a general RL+Search algorithm that can work in all two-player zero-sum games, including imperfect-information games. ReBeL builds on the RL+Search algorithms like AlphaZero that have proved successful in perfect-information games. Unlike those previous AIs, however, ReBeL makes decisions by factoring in the probability distribution of different beliefs each player might have about the current state of the game, which we call a public belief state (PBS). In other words, ReBeL can assess the chances that its poker opponent thinks it has, for example, a pair of aces.

By accounting for the beliefs of each player, ReBeL is able to treat imperfect-information games akin to perfect-information games. ReBeL can then leverage a modified RL+Search algorithm that we developed to work with the more complex (higher-dimensional) state and action space of imperfect-information games.

## 6 Incoming

Algorithms for Decision Making: Decision making, in the sense of reinforcement learning

This book provides a broad introduction to algorithms for decision making under uncertainty. We cover a wide variety of topics related to decision making, introducing the underlying mathematical problem formulations and the algorithms for solving them.

Includes much of interest, including multi-agent learning.

## 7 References

*arXiv:2006.05604 [Cs, Math, Stat]*.

*arXiv:1606.01540 [Cs]*.

*Annual Review of Statistics and Its Application*.

*Encyclopedia of Cognitve Science*.

*The Science of Deep Learning*.

*The Science of Deep Learning*.

*The Science of Deep Learning*.

*Advances in Neural Information Processing Systems*.

*Journal of Artifical Intelligence Research*.

*Algorithms for decision making*.

*Commun. ACM*.

*arXiv:1602.02722 [Cs, Stat]*.

*arXiv:1805.00909 [Cs, Stat]*.

*arXiv:1803.07055 [Cs, Math, Stat]*.

*arXiv:1702.08360 [Cs]*.

*arXiv:1610.01945 [Cs, Stat]*.

*arXiv:1703.03864 [Cs, Stat]*.

*Algorithmic Learning Theory*. Lecture Notes in Computer Science 4264.

*Artificial Intelligence*.

*Reinforcement Learning*.

*Reinforcement Learning, second edition: An Introduction*.

*Advances in Neural Information Processing Systems*.