Bregman divergences
August 29, 2023 — August 29, 2023
Bregman
functional analysis
optimization
statmech
In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function. They form an important class of divergences. When the points are interpreted as probability distributions — notably as either values of the parameter of a parametric model or as a data set of observed values — the resulting distance is a statistical distance. The most basic Bregman divergence is the squared Euclidean distance.
Useful in mirror descent.
- Meet the Bregman Divergences ← Inductio Ex Machina ← Mark Reid
- Bregman divergences, dual information geometry, and generalized convexity
- connection to score matching, contrastive divergence, mirror descent
1 References
Banerjee, Merugu, Dhillon, et al. 2005. “Clustering with Bregman Divergences.” Journal of Machine Learning Research.
Bansal, and Gupta. 2019. “Potential-Function Proofs for First-Order Methods.”
Benamou, Carlier, Cuturi, et al. 2014. “Iterative Bregman Projections for Regularized Transportation Problems.” arXiv:1412.5154 [Math].
Boyd. 2010. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.
Collins, Dasgupta, and Schapire. 2001. “A Generalization of Principal Components Analysis to the Exponential Family.” In Advances in Neural Information Processing Systems.
Flammarion, and Bach. 2017. “Stochastic Composite Least-Squares Regression with Convergence Rate O(1/n).” arXiv:1702.06429 [Math, Stat].
Gneiting, and Raftery. 2007. “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association.
Goldstein, Osher, Goldstein, et al. 2009. “The Split Bregman Method for L1-Regularized Problems.” SIAM Journal on Imaging Sciences.
Gopalan, Hu, Kim, et al. 2022. “Loss Minimization Through the Lens of Outcome Indistinguishability.”
Gutmann, and Hirayama. 2011. “Bregman Divergence as General Framework to Estimate Unnormalized Statistical Models.” In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. UAI’11.
Harremoës. 2015. “Proper Scoring and Sufficiency.” arXiv:1507.07089 [Math, Stat].
Li, Schwab, Antholzer, et al. 2020. “NETT: Solving Inverse Problems with Deep Neural Networks.” Inverse Problems.
Menon, and Ong. 2016. “Linking Losses for Density Ratio and Class-Probability Estimation.” In Proceedings of The 33rd International Conference on Machine Learning.
Nielsen. 2018. “An Elementary Introduction to Information Geometry.” arXiv:1808.08271 [Cs, Math, Stat].
Nock, Menon, and Ong. 2016. “A Scaled Bregman Theorem with Applications.” arXiv:1607.00360 [Cs, Stat].
Reid, and Williamson. 2011. “Information, Divergence and Risk for Binary Experiments.” Journal of Machine Learning Research.
Singh, and Gordon. 2008. “A Unified View of Matrix Factorization Models.” In Machine Learning and Knowledge Discovery in Databases.
Sra, and Dhillon. 2006. “Generalized Nonnegative Matrix Approximations with Bregman Divergences.” In Advances in Neural Information Processing Systems 18.
Wibisono, Wilson, and Jordan. 2016. “A Variational Perspective on Accelerated Methods in Optimization.” Proceedings of the National Academy of Sciences.
Yin, Osher, Goldfarb, et al. 2008. “Bregman Iterative Algorithms for \(\ell_1\)-Minimization with Applications to Compressed Sensing.” SIAM Journal on Imaging Sciences.
Zhang. 2013. “Bregman Divergence and Mirror Descent.”