History-Based Reinforcement Learning

2026-01-16 — 2026-01-16

Wherein the Markovian Supposition Is Relaxed, the Lineage to AIXI Is Noted, and Computable Approximations to an Otherwise Intractable Scheme Are Surveyed in Recent Q-Learning Work.

adaptive

agents

bandit problems

control

incentive mechanisms

learning

networks

stochastic processes

time series

utility

An interesting alternative formulation that relaxes the Markovian assumptions in mainstream RL, though I don’t know much about it. It seems to come out of AIXI and be similarly intractable, but it’s not purely theoretical; computable approximations do exist (Daswani et al., n.d.; Gao et al. 2023; Guez, Silver, and Dayan 2013; Hamilton, Fard, and Pineau, n.d.; Sunehag 2014; Tennenholtz et al. 2023).

TBD

1 References

Daswani, Sunehag, Sunehag, et al. n.d. “Q-Learning for History-Based Reinforcement Learning.”

Eberhard, Muehlebach, and Vernade. 2025. “Partially Observable Reinforcement Learning with Memory Traces.”

Gao, Zhang, Yang, et al. 2023. “Fast Counterfactual Inference for History-Based Reinforcement Learning.” Proceedings of the AAAI Conference on Artificial Intelligence.

Guez, Silver, and Dayan. 2013. “Efficient Bayes-Adaptive Reinforcement Learning Using Sample-Based Search.”

Hamilton, Fard, and Pineau. n.d. “Eﬃcient Learning and Planning with Compressed Predictive States.”

Leike. 2016. “Nonparametric General Reinforcement Learning.”

Sunehag. 2014. “Feature Reinforcement Learning: State of the Art.”

Tennenholtz, Merlis, Shani, et al. 2023. “Reinforcement Learning with History-Dependent Dynamic Contexts.”