History-Based Reinforcement Learning

2026-01-16 — 2026-01-16

Wherein the Markovian supposition is relaxed, the lineage to AIXI is noted, and computable approximations to an otherwise intractable scheme are surveyed in recent Q-learning work.

adaptive
agents
bandit problems
control
incentive mechanisms
learning
networks
stochastic processes
time series
utility
Figure 1

An interesting alternative formulation that relaxes the Markovian assumptions in mainstream RL, though I don’t know much about it. It seems to come out of AIXI and be similarly intractable, but it’s not purely theoretical; computable approximations do exist (Daswani et al., n.d.; Gao et al. 2023; Guez, Silver, and Dayan 2013; Hamilton, Fard, and Pineau, n.d.; Sunehag 2014; Tennenholtz et al. 2023).

TBD

1 References

Daswani, Sunehag, Sunehag, et al. n.d. “Q-Learning for History-Based Reinforcement Learning.”
Eberhard, Muehlebach, and Vernade. 2025. Partially Observable Reinforcement Learning with Memory Traces.”
Gao, Zhang, Yang, et al. 2023. Fast Counterfactual Inference for History-Based Reinforcement Learning.” Proceedings of the AAAI Conference on Artificial Intelligence.
Guez, Silver, and Dayan. 2013. Efficient Bayes-Adaptive Reinforcement Learning Using Sample-Based Search.”
Hamilton, Fard, and Pineau. n.d. “Efficient Learning and Planning with Compressed Predictive States.”
Leike. 2016. “Nonparametric General Reinforcement Learning.”
Sunehag. 2014. Feature Reinforcement Learning: State of the Art.”
Tennenholtz, Merlis, Shani, et al. 2023. Reinforcement Learning with History-Dependent Dynamic Contexts.”