Markov decision problems

2014-11-27 — 2022-06-07

Wherein discrete-time stochastic control problems are presented, and partially observable cases (POMDPs) are examined, while connections to optimal control and learning of forward propagators are outlined.

bandit problems

control

dynamical systems

linear algebra

optimization

probability

signal processing

statistics

stochastic processes

stringology

time series

TODO—connect to optimal control.

1 Classic

Bellman and Howard’s classic discrete-time control stochastic problem.

Warren Powell’s Introduction to Markov decision processes

2 POMDP

Figure 1: Figure: Partial observation of Mrs Brown’s

“A POMDP is a partially observable Markov decision process. It is a model, originating in the operations research (OR) literature, for describing planning tasks in which the decision maker does not have complete information as to its current state. The POMDP model provides a convenient way of reasoning about tradeoffs between actions to gain reward and actions to gain information.”

2.1 POMDP while learning forward propagator

Keen to investigate Collins and Kurniawati (2021), which I believe is also summarised in Kurniawati (2022).

3 References

Ayed, and de Bézenac. 2019. “Learning Dynamical Systems from Partial Observations.” In Advances In Neural Information Processing Systems.

Collins, and Kurniawati. 2019. “Partially Observable Planning and Learning for Systems with Non-Uniform Dynamics.”

———. 2021. “Locally-Connected Interrelated Network: A Forward Propagation Primitive.” In Algorithmic Foundations of Robotics XIV. Springer Proceedings in Advanced Robotics.

Jaakkola, Singh, and Jordan. 1995. “Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems.” In Advances in Neural Information Processing Systems.

Kurniawati. 2022. “Partially Observable Markov Decision Processes and Robotics.” Annual Review of Control, Robotics, and Autonomous Systems.

Ohsawa. 2021. “Unbiased Self-Play.” arXiv:2106.03007 [Cs, Econ, Stat].

Parisotto, and Salakhutdinov. 2017. “Neural Map: Structured Memory for Deep Reinforcement Learning.” arXiv:1702.08360 [Cs].

Roy, Gordon, and Thrun. 2005. “Finding Approximate POMDP Solutions Through Belief Compression.” Journal of Artificial Intelligence Research.

Thrun, Langford, and Fox. 1999. “Monte Carlo Hidden Markov Models: Learning Non-Parametric Models of Partially Observable Stochastic Processes.” In Proceedings of the International Conference on Machine Learning.