In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. When the points are interpreted as probability distributions β notably as either values of the parameter of a parametric model or as a data set of observed values β the resulting distance is a statistical distance. The most basic Bregman divergence is the squared Euclidean distance.

Useful in mirror descent.

- Meet the Bregman Divergences β Inductio Ex Machina β Mark Reid
- Bregman divergences, dual information geometry, and generalized convexity

## References

Banerjee, Arindam, Srujana Merugu, Inderjit S Dhillon, Joydeep Ghosh, and John Lafferty. 2005. βClustering with Bregman Divergences.β

*Journal of Machine Learning Research*6 (10).Bansal, Nikhil, and Anupam Gupta. 2019. βPotential-Function Proofs for First-Order Methods.β arXiv.

Benamou, Jean-David, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel PeyrΓ©. 2014. βIterative Bregman Projections for Regularized Transportation Problems.β

*arXiv:1412.5154 [Math]*, December.Boyd, Stephen. 2010.

*Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers*. Vol. 3. Now Publishers Inc.Collins, Michael, S. Dasgupta, and Robert E Schapire. 2001. βA Generalization of Principal Components Analysis to the Exponential Family.β In

*Advances in Neural Information Processing Systems*. Vol. 14. MIT Press.Flammarion, Nicolas, and Francis Bach. 2017. βStochastic Composite Least-Squares Regression with Convergence Rate O(1/n).β

*arXiv:1702.06429 [Math, Stat]*, February.Gneiting, Tilmann, and Adrian E Raftery. 2007. βStrictly Proper Scoring Rules, Prediction, and Estimation.β

*Journal of the American Statistical Association*102 (477): 359β78.Goldstein, Tom, Stanley Osher, Tom Goldstein, and Stanley Osher. 2009. βThe Split Bregman Method for L1-Regularized Problems.β

*SIAM Journal on Imaging Sciences*2 (2): 323.Gopalan, Parikshit, Lunjia Hu, Michael P. Kim, Omer Reingold, and Udi Wieder. 2022. βLoss Minimization Through the Lens of Outcome Indistinguishability.β arXiv.

HarremoΓ«s, Peter. 2015. βProper Scoring and Sufficiency.β

*arXiv:1507.07089 [Math, Stat]*, July.Li, Housen, Johannes Schwab, Stephan Antholzer, and Markus Haltmeier. 2020. βNETT: Solving Inverse Problems with Deep Neural Networks.β

*Inverse Problems*36 (6): 065005.Nielsen, Frank. 2018. βAn Elementary Introduction to Information Geometry.β

*arXiv:1808.08271 [Cs, Math, Stat]*, August.Nock, Richard, Aditya Krishna Menon, and Cheng Soon Ong. 2016. βA Scaled Bregman Theorem with Applications.β

*arXiv:1607.00360 [Cs, Stat]*, July.Reid, Mark D., and Robert C. Williamson. 2011. βInformation, Divergence and Risk for Binary Experiments.β

*Journal of Machine Learning Research*12 (Mar): 731β817.Singh, Ajit P., and Geoffrey J. Gordon. 2008. βA Unified View of Matrix Factorization Models.β In

*Machine Learning and Knowledge Discovery in Databases*, 358β73. Springer.Sra, Suvrit, and Inderjit S. Dhillon. 2006. βGeneralized Nonnegative Matrix Approximations with Bregman Divergences.β In

*Advances in Neural Information Processing Systems 18*, edited by Y. Weiss, B. SchΓΆlkopf, and J. C. Platt, 283β90. MIT Press.Wibisono, Andre, Ashia C. Wilson, and Michael I. Jordan. 2016. βA Variational Perspective on Accelerated Methods in Optimization.β

*Proceedings of the National Academy of Sciences*113 (47): E7351β58.Yin, W, S Osher, D Goldfarb, and J Darbon. 2008. βBregman Iterative Algorithms for \(\ell_1\)-Minimization with Applications to Compressed Sensing.β

*SIAM Journal on Imaging Sciences*1 (1): 143β68.
## No comments yet. Why not leave one?