Markov decision problems
2014-11-27 — 2022-06-07
Wherein discrete-time stochastic control problems are presented, and partially observable cases (POMDPs) are examined, while connections to optimal control and learning of forward propagators are outlined.
TODO—connect to optimal control.
1 Classic
Bellman and Howard’s classic discrete-time control stochastic problem.
- Warren Powell’s Introduction to Markov decision processes
2 POMDP
“A POMDP is a partially observable Markov decision process. It is a model, originating in the operations research (OR) literature, for describing planning tasks in which the decision maker does not have complete information as to its current state. The POMDP model provides a convenient way of reasoning about tradeoffs between actions to gain reward and actions to gain information.”
2.1 POMDP while learning forward propagator
Keen to investigate Collins and Kurniawati (2021), which I believe is also summarised in Kurniawati (2022).