TODO— connect to optimal control.
Classic
Bellman and Howard’s classic discrete time control stochastic problem.
- Warren Powell’s Introduction to Markov decision processes
POMDP
Figure: Partial observation of Mrs Brown’s
“A POMDP is a partially observable Markov decision process. It is a model, originating in the operations research (OR) literature, for describing planning tasks in which the decision maker does not have complete information as to its current state. The POMDP model provides a convenient way of reasoning about tradeoffs between actions to gain reward and actions to gain information.”
References
Ayed, Ibrahim, and Emmanuel de Bézenac. 2019. “Learning Dynamical Systems from Partial Observations.” In Advances In Neural Information Processing Systems, 12.
Collins, Nicholas, and Hanna Kurniawati. 2019. “Partially Observable Planning and Learning for Systems with Non-Uniform Dynamics.” arXiv.
———. 2021. “Locally-Connected Interrelated Network: A Forward Propagation Primitive.” In Algorithmic Foundations of Robotics XIV, edited by Steven M. LaValle, Ming Lin, Timo Ojala, Dylan Shell, and Jingjin Yu, 124–42. Springer Proceedings in Advanced Robotics. Cham: Springer International Publishing.
Jaakkola, Tommi, Satinder P. Singh, and Michael I. Jordan. 1995. “Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems.” In Advances in Neural Information Processing Systems, 345–52.
Kurniawati, Hanna. 2022. “Partially Observable Markov Decision Processes and Robotics.” Annual Review of Control, Robotics, and Autonomous Systems 5 (1): 253–77.
Ohsawa, Shohei. 2021. “Unbiased Self-Play.” arXiv:2106.03007 [Cs, Econ, Stat], June.
Parisotto, Emilio, and Ruslan Salakhutdinov. 2017. “Neural Map: Structured Memory for Deep Reinforcement Learning.” arXiv:1702.08360 [Cs], February.
Roy, Nicholas, Geoffrey Gordon, and Sebastian Thrun. 2005. “Finding Approximate POMDP Solutions Through Belief Compression.” Journal of Artificial Intelligence Research 23 (1): 1–40.
Thrun, Sebastian, John Langford, and Dieter Fox. 1999. “Monte Carlo Hidden Markov Models: Learning Non-Parametric Models of Partially Observable Stochastic Processes.” In Proceedings of the International Conference on Machine Learning. Bled, Slovenia.
No comments yet. Why not leave one?