Markov decision problems



TODO— connect to optimal control.

Classic

Bellman and Howard’s classic discrete time control stochastic problem.

POMDP

Figure: Partial observation of Mrs Brown’s

“A POMDP is a partially observable Markov decision process. It is a model, originating in the operations research (OR) literature, for describing planning tasks in which the decision maker does not have complete information as to its current state. The POMDP model provides a convenient way of reasoning about tradeoffs between actions to gain reward and actions to gain information.”

POMDP while learning forward propagator

Keen to investigate Collins and Kurniawati (2021), which I believe is also summarised in Kurniawati (2022).

References

Ayed, Ibrahim, and Emmanuel de Bézenac. 2019. “Learning Dynamical Systems from Partial Observations.” In Advances In Neural Information Processing Systems, 12.
Collins, Nicholas, and Hanna Kurniawati. 2019. Partially Observable Planning and Learning for Systems with Non-Uniform Dynamics.” arXiv.
———. 2021. Locally-Connected Interrelated Network: A Forward Propagation Primitive.” In Algorithmic Foundations of Robotics XIV, edited by Steven M. LaValle, Ming Lin, Timo Ojala, Dylan Shell, and Jingjin Yu, 124–42. Springer Proceedings in Advanced Robotics. Cham: Springer International Publishing.
Jaakkola, Tommi, Satinder P. Singh, and Michael I. Jordan. 1995. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems.” In Advances in Neural Information Processing Systems, 345–52.
Kurniawati, Hanna. 2022. Partially Observable Markov Decision Processes and Robotics.” Annual Review of Control, Robotics, and Autonomous Systems 5 (1): 253–77.
Ohsawa, Shohei. 2021. Unbiased Self-Play.” arXiv:2106.03007 [Cs, Econ, Stat], June.
Parisotto, Emilio, and Ruslan Salakhutdinov. 2017. Neural Map: Structured Memory for Deep Reinforcement Learning.” arXiv:1702.08360 [Cs], February.
Roy, Nicholas, Geoffrey Gordon, and Sebastian Thrun. 2005. Finding Approximate POMDP Solutions Through Belief Compression.” Journal of Artificial Intelligence Research 23 (1): 1–40.
Thrun, Sebastian, John Langford, and Dieter Fox. 1999. Monte Carlo Hidden Markov Models: Learning Non-Parametric Models of Partially Observable Stochastic Processes.” In Proceedings of the International Conference on Machine Learning. Bled, Slovenia.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.