Figure 1

Beck and Teboulle ():

The mirror descent algorithm (MDA) was introduced by Nemirovsky and Yudin for solving convex optimization problems. This method exhibits an efficiency estimate that is mildly dependent on the decision variables dimension, and thus suitable for solving very large-scale optimization problems. We present a new derivation and analysis of this algorithm. We show that the MDA can be viewed as a nonlinear projected-subgradient type method, derived from using a general distance-like function instead of the usual Euclidean squared distance. Within this interpretation, we derive in a simple way convergence and efficiency estimates. We then propose an Entropic mirror descent algorithm for convex minimization over the unit simplex, with a global efficiency estimate proven to be mildly dependent on the dimension of the problem.

Bubeck’s lectures are good: Bubeck ().

Mirror Descent generalizes gradient descent to settings where the feasible set XRr is not naturally Euclidean. It relies on:

  1. A mirror map Φ:CR, a strictly convex, differentiable function whose gradient Φ maps the primal space X into the dual space Rr.

  2. The associated Bregman divergence

    DΦ(p,q)=Φ(p)Φ(q)Φ(q),pq.

Starting from x0X, each iteration does

Φ(yk+1)=Φ(xk)ηgk,xk+1=argminxXDΦ(x,yk+1),

where gkf(xk) . So that’s two steps, really, interleaved, a gradient descent in the dual space and then a projection in the primal space. Equivalently,

xk+1=argminxXηgk,x+DΦ(x,xk).

As to why we’d bother: many reasons.

  1. Mirror descend corresponds to what we want to do anyway, in situation where we have two natural representations of an estimand. So it’s easy.
  2. It is theoretically very tractable, so it’s easy to prove stuff about it
  3. It is fast, in that you can provably require few iterations to get very close to the optimum, and it’s a first order method (kinda) so you might hope that these iterations will be tractable.
  4. It easily generalises to online/SGD settings where we observe data sequentially

Rarely is the obvious thing the best thing, in practical mathematics. AFAICT Mirror descent is a nearly obvious thing that is nearly the best.

1 Incoming

2 References

Ajanthan, Gupta, Torr, et al. 2021. Mirror Descent View for Neural Network Quantization.” In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics.
Bansal, and Gupta. 2019. Potential-Function Proofs for First-Order Methods.”
Beck, and Teboulle. 2003. Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization.” Operations Research Letters.
Bubeck. 2015. Convex Optimization: Algorithms and Complexity. Foundations and Trends in Machine Learning.
———. 2019. The Five Miracles of Mirror Descent.
Crucinio. 2025. A Mirror Descent Approach to Maximum Likelihood Estimation in Latent Variable Models.”
Gupta. 2020. CMU 15-850 Advanced Algorithms.”
Jacobsen, and Cutkosky. 2022. Parameter-Free Mirror Descent.”
Lee, Panageas, Piliouras, et al. 2017. First-Order Methods Almost Always Avoid Saddle Points.” arXiv:1710.07406 [Cs, Math, Stat].
Wibisono, and Wilson. 2015. On Accelerated Methods in Optimization.” arXiv:1509.03616 [Math].
Zhang. 2013. Bregman Divergence and Mirror Descent.”