Mirror descent
2019-12-29 — 2023-08-29
Beck and Teboulle (2003):
The mirror descent algorithm (MDA) was introduced by Nemirovsky and Yudin for solving convex optimization problems. This method exhibits an efficiency estimate that is mildly dependent on the decision variables dimension, and thus suitable for solving very large-scale optimization problems. We present a new derivation and analysis of this algorithm. We show that the MDA can be viewed as a nonlinear projected-subgradient type method, derived from using a general distance-like function instead of the usual Euclidean squared distance. Within this interpretation, we derive in a simple way convergence and efficiency estimates. We then propose an Entropic mirror descent algorithm for convex minimization over the unit simplex, with a global efficiency estimate proven to be mildly dependent on the dimension of the problem.
Bubeck’s lectures are good: Bubeck (2019).
Mirror Descent generalizes gradient descent to settings where the feasible set
A mirror map
, a strictly convex, differentiable function whose gradient maps the primal space into the dual space .The associated Bregman divergence
Starting from
where
As to why we’d bother: many reasons.
- Mirror descend corresponds to what we want to do anyway, in situation where we have two natural representations of an estimand. So it’s easy.
- It is theoretically very tractable, so it’s easy to prove stuff about it
- It is fast, in that you can provably require few iterations to get very close to the optimum, and it’s a first order method (kinda) so you might hope that these iterations will be tractable.
- It easily generalises to online/SGD settings where we observe data sequentially
Rarely is the obvious thing the best thing, in practical mathematics. AFAICT Mirror descent is a nearly obvious thing that is nearly the best.
1 Incoming
- T Lienart, Mirror descent algorithm.
- Xinhua Zhang notes
- Nicholas Harvey notes
- Sebastian Pokutta, Cheat Sheet: Subgradient Descent, Mirror Descent, and Online Learning
- Ch 17 of Gupta (2020) is very clear