History-Based Reinforcement Learning
2026-01-16 — 2026-01-16
Wherein Michie’s MENACE matchboxes are evoked to introduce policy‑gradient and value methods, exploration pathologies and Go‑Explore are outlined, and simple PyTorch recipes are provided.
adaptive
agents
bandit problems
control
incentive mechanisms
learning
networks
stochastic processes
time series
utility
Interesting alternate formulation relaxing Markovian assumptions of mainstream RL, about which I know little. It seems to spring from AIXI and be similarly intractable, but is not purely of theoretical interest; computable approximations exist (Daswani et al., n.d.; Gao et al. 2023; Guez, Silver, and Dayan 2013; Hamilton, Fard, and Pineau, n.d.; Sunehag 2014; Tennenholtz et al. 2023).
TBD
1 References
Daswani, Sunehag, Sunehag, et al. n.d. “Q-Learning for History-Based Reinforcement Learning.”
Eberhard, Muehlebach, and Vernade. 2025. “Partially Observable Reinforcement Learning with Memory Traces.”
Gao, Zhang, Yang, et al. 2023. “Fast Counterfactual Inference for History-Based Reinforcement Learning.” Proceedings of the AAAI Conference on Artificial Intelligence.
Guez, Silver, and Dayan. 2013. “Efficient Bayes-Adaptive Reinforcement Learning Using Sample-Based Search.”
Hamilton, Fard, and Pineau. n.d. “Efficient Learning and Planning with Compressed Predictive States.”
Leike. 2016. “Nonparametric General Reinforcement Learning.”
Sunehag. 2014. “Feature Reinforcement Learning: State of the Art.”
Tennenholtz, Merlis, Shani, et al. 2023. “Reinforcement Learning with History-Dependent Dynamic Contexts.”
