Non-stationary bandit problems

Restless bandits

May 27, 2024 — June 19, 2024

bandit problems
control
decision theory
signal processing
stochastic processes
stringology
Figure 1

Bandit problems where the reward is nonstationary.

Wikipedia recommends we consider Whittle (1988) as a foundational article and that hip recent results include Garivier and Moulines (2008).

1 References

Bedi, Koppel, Rajawat, et al. 2019. Nonstationary Nonparametric Online Learning: Balancing Dynamic Regret and Model Parsimony.”
D’Orazio, Loizou, Laradji, et al. 2023. Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize.” Transactions on Machine Learning Research.
Dearden, Friedman, and Russell. 1998. Bayesian Q-Learning.” In AAAI/IAAI 1998.
Dick, Gyorgy, and Szepesvari. 2014. Online Learning in Markov Decision Processes with Changing Cost Sequences.” In Proceedings of the 31st International Conference on Machine Learning.
Duchi, Agarwal, Johansson, et al. 2012. Ergodic Mirror Descent.” SIAM Journal on Optimization.
Even. 2023. Stochastic Gradient Descent Under Markovian Sampling Schemes.”
Fei, Yang, Wang, et al. 2020. Dynamic Regret of Policy Optimization in Non-Stationary Environments.” In Advances in Neural Information Processing Systems.
Garivier, and Moulines. 2008. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems.”
Granmo, and Berg. 2010. Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters.” In Trends in Applied Intelligent Systems.
Jones, Brennan, Chen, et al. n.d. “Multi-Armed Bandits with Non-Stationary Means.”
Karampatziakis, Langford, and Mineiro. 2020. Empirical Likelihood for Contextual Bandits.” In Advances in Neural Information Processing Systems.
Li, Zhao, and Lan. n.d. “Robust Policy Mirror Descent for Controlling Uncertain Markov Decision Process.”
Robinson, and Hartemink. n.d. “Non-Stationary Dynamic Bayesian Networks.”
Whittle. 1988. Restless Bandits: Activity Allocation in a Changing World.” Journal of Applied Probability.
Yan, Zhao, and Zhou. 2023. Online Non-Stochastic Control with Partial Feedback.” Journal of Machine Learning Research.