Bregman divergences



Bregman divergence

In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. When the points are interpreted as probability distributions – notably as either values of the parameter of a parametric model or as a data set of observed values – the resulting distance is a statistical distance. The most basic Bregman divergence is the squared Euclidean distance.

Useful in mirror descent.

References

Banerjee, Arindam, Srujana Merugu, Inderjit S Dhillon, Joydeep Ghosh, and John Lafferty. 2005. β€œClustering with Bregman Divergences.” Journal of Machine Learning Research 6 (10).
Bansal, Nikhil, and Anupam Gupta. 2019. β€œPotential-Function Proofs for First-Order Methods.” arXiv.
Benamou, Jean-David, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel PeyrΓ©. 2014. β€œIterative Bregman Projections for Regularized Transportation Problems.” arXiv:1412.5154 [Math], December.
Boyd, Stephen. 2010. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Vol. 3. Now Publishers Inc.
Collins, Michael, S. Dasgupta, and Robert E Schapire. 2001. β€œA Generalization of Principal Components Analysis to the Exponential Family.” In Advances in Neural Information Processing Systems. Vol. 14. MIT Press.
Flammarion, Nicolas, and Francis Bach. 2017. β€œStochastic Composite Least-Squares Regression with Convergence Rate O(1/n).” arXiv:1702.06429 [Math, Stat], February.
Gneiting, Tilmann, and Adrian E Raftery. 2007. β€œStrictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association 102 (477): 359–78.
Goldstein, Tom, Stanley Osher, Tom Goldstein, and Stanley Osher. 2009. β€œThe Split Bregman Method for L1-Regularized Problems.” SIAM Journal on Imaging Sciences 2 (2): 323.
Gopalan, Parikshit, Lunjia Hu, Michael P. Kim, Omer Reingold, and Udi Wieder. 2022. β€œLoss Minimization Through the Lens of Outcome Indistinguishability.” arXiv.
HarremoΓ«s, Peter. 2015. β€œProper Scoring and Sufficiency.” arXiv:1507.07089 [Math, Stat], July.
Li, Housen, Johannes Schwab, Stephan Antholzer, and Markus Haltmeier. 2020. β€œNETT: Solving Inverse Problems with Deep Neural Networks.” Inverse Problems 36 (6): 065005.
Nielsen, Frank. 2018. β€œAn Elementary Introduction to Information Geometry.” arXiv:1808.08271 [Cs, Math, Stat], August.
Nock, Richard, Aditya Krishna Menon, and Cheng Soon Ong. 2016. β€œA Scaled Bregman Theorem with Applications.” arXiv:1607.00360 [Cs, Stat], July.
Reid, Mark D., and Robert C. Williamson. 2011. β€œInformation, Divergence and Risk for Binary Experiments.” Journal of Machine Learning Research 12 (Mar): 731–817.
Singh, Ajit P., and Geoffrey J. Gordon. 2008. β€œA Unified View of Matrix Factorization Models.” In Machine Learning and Knowledge Discovery in Databases, 358–73. Springer.
Sra, Suvrit, and Inderjit S. Dhillon. 2006. β€œGeneralized Nonnegative Matrix Approximations with Bregman Divergences.” In Advances in Neural Information Processing Systems 18, edited by Y. Weiss, B. SchΓΆlkopf, and J. C. Platt, 283–90. MIT Press.
Wibisono, Andre, Ashia C. Wilson, and Michael I. Jordan. 2016. β€œA Variational Perspective on Accelerated Methods in Optimization.” Proceedings of the National Academy of Sciences 113 (47): E7351–58.
Yin, W, S Osher, D Goldfarb, and J Darbon. 2008. β€œBregman Iterative Algorithms for \(\ell_1\)-Minimization with Applications to Compressed Sensing.” SIAM Journal on Imaging Sciences 1 (1): 143–68.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.