# Bregman divergences

August 29, 2023 — August 29, 2023

Bregman

functional analysis

optimization

statmech

In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function. They form an important class of divergences. When the points are interpreted as probability distributions — notably as either values of the parameter of a parametric model or as a data set of observed values — the resulting distance is a statistical distance. The most basic Bregman divergence is the squared Euclidean distance.

Useful in mirror descent.

- Meet the Bregman Divergences ← Inductio Ex Machina ← Mark Reid
- Bregman divergences, dual information geometry, and generalized convexity
- connection to score matching, contrastive divergence, mirror descent

## 1 References

Banerjee, Merugu, Dhillon, et al. 2005. “Clustering with Bregman Divergences.”

*Journal of Machine Learning Research*.
Bansal, and Gupta. 2019. “Potential-Function Proofs for First-Order Methods.”

Benamou, Carlier, Cuturi, et al. 2014. “Iterative Bregman Projections for Regularized Transportation Problems.”

*arXiv:1412.5154 [Math]*.
Boyd. 2010.

*Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers*.
Collins, Dasgupta, and Schapire. 2001. “A Generalization of Principal Components Analysis to the Exponential Family.” In

*Advances in Neural Information Processing Systems*.
Flammarion, and Bach. 2017. “Stochastic Composite Least-Squares Regression with Convergence Rate O(1/n).”

*arXiv:1702.06429 [Math, Stat]*.
Gneiting, and Raftery. 2007. “Strictly Proper Scoring Rules, Prediction, and Estimation.”

*Journal of the American Statistical Association*.
Goldstein, Osher, Goldstein, et al. 2009. “The Split Bregman Method for L1-Regularized Problems.”

*SIAM Journal on Imaging Sciences*.
Gopalan, Hu, Kim, et al. 2022. “Loss Minimization Through the Lens of Outcome Indistinguishability.”

Gutmann, and Hirayama. 2011. “Bregman Divergence as General Framework to Estimate Unnormalized Statistical Models.” In

*Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence*. UAI’11.
Harremoës. 2015. “Proper Scoring and Sufficiency.”

*arXiv:1507.07089 [Math, Stat]*.
Li, Schwab, Antholzer, et al. 2020. “NETT: Solving Inverse Problems with Deep Neural Networks.”

*Inverse Problems*.
Menon, and Ong. 2016. “Linking Losses for Density Ratio and Class-Probability Estimation.” In

*Proceedings of The 33rd International Conference on Machine Learning*.
Nielsen. 2018. “An Elementary Introduction to Information Geometry.”

*arXiv:1808.08271 [Cs, Math, Stat]*.
Nock, Menon, and Ong. 2016. “A Scaled Bregman Theorem with Applications.”

*arXiv:1607.00360 [Cs, Stat]*.
Reid, and Williamson. 2011. “Information, Divergence and Risk for Binary Experiments.”

*Journal of Machine Learning Research*.
Singh, and Gordon. 2008. “A Unified View of Matrix Factorization Models.” In

*Machine Learning and Knowledge Discovery in Databases*.
Sra, and Dhillon. 2006. “Generalized Nonnegative Matrix Approximations with Bregman Divergences.” In

*Advances in Neural Information Processing Systems 18*.
Wibisono, Wilson, and Jordan. 2016. “A Variational Perspective on Accelerated Methods in Optimization.”

*Proceedings of the National Academy of Sciences*.
Yin, Osher, Goldfarb, et al. 2008. “Bregman Iterative Algorithms for \(\ell_1\)-Minimization with Applications to Compressed Sensing.”

*SIAM Journal on Imaging Sciences*.
Zhang. 2013. “Bregman Divergence and Mirror Descent.”