Bregman divergences

August 29, 2023 — August 29, 2023

Bregman
functional analysis
optimization
statmech
Figure 1

Bregman divergence

In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. When the points are interpreted as probability distributions – notably as either values of the parameter of a parametric model or as a data set of observed values – the resulting distance is a statistical distance. The most basic Bregman divergence is the squared Euclidean distance.

Useful in mirror descent.

1 References

Andrieu, Doucet, and Holenstein. 2010. Particle Markov Chain Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Banerjee, Merugu, Dhillon, et al. 2005. “Clustering with Bregman Divergences.” Journal of Machine Learning Research.
Bansal, and Gupta. 2019. Potential-Function Proofs for First-Order Methods.”
Benamou, Carlier, Cuturi, et al. 2014. Iterative Bregman Projections for Regularized Transportation Problems.” arXiv:1412.5154 [Math].
Boyd. 2010. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.
Bretó, He, Ionides, et al. 2009. Time Series Analysis via Mechanistic Models.” The Annals of Applied Statistics.
Collins, Dasgupta, and Schapire. 2001. A Generalization of Principal Components Analysis to the Exponential Family.” In Advances in Neural Information Processing Systems.
Dahlhaus, and Eichler. 2003. Causality and Graphical Models in Time Series Analysis.” Oxford Statistical Science Series.
Daley, and Vere-Jones. 2003. An introduction to the theory of point processes.
———. 2008. An Introduction to the Theory of Point Processes. Probability and Its Applications.
Djouadi, Maroulas, Pan, et al. 2017. Consistency and Asymptotics of a Poisson Intensity Least-Squares Estimator for Partially Observed Jump–Diffusion Processes.” Statistics & Probability Letters.
Doucet, Jacob, and Rubenthaler. 2013. Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models.” arXiv:1304.5768 [Stat].
Eichler, Dahlhaus, and Dueck. 2016. Graphical Modeling for Multivariate Hawkes Processes with Nonparametric Link Functions.” Journal of Time Series Analysis.
Flammarion, and Bach. 2017. Stochastic Composite Least-Squares Regression with Convergence Rate O(1/n).” arXiv:1702.06429 [Math, Stat].
Giesecke, and Schwenkler. 2011. Filtered Likelihood for Point Processes.” SSRN Scholarly Paper ID 1898344.
Gneiting, and Raftery. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association.
Goldstein, Osher, Goldstein, et al. 2009. The Split Bregman Method for L1-Regularized Problems.” SIAM Journal on Imaging Sciences.
Gopalan, Hu, Kim, et al. 2022. Loss Minimization Through the Lens of Outcome Indistinguishability.”
Gutmann, and Hirayama. 2011. Bregman Divergence as General Framework to Estimate Unnormalized Statistical Models.” In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. UAI’11.
Harremoës. 2015. Proper Scoring and Sufficiency.” arXiv:1507.07089 [Math, Stat].
Hawkes. 1971a. Point Spectra of Some Mutually Exciting Point Processes.” Journal of the Royal Statistical Society. Series B (Methodological).
———. 1971b. Spectra of Some Self-Exciting and Mutually Exciting Point Processes.” Biometrika.
Hawkes, and Oakes. 1974. A Cluster Process Representation of a Self-Exciting Process.” Journal of Applied Probability.
He, Ionides, and King. 2010. Plug-and-Play Inference for Disease Dynamics: Measles in Large and Small Populations as a Case Study.” Journal of The Royal Society Interface.
Ionides, Edward L., Bhadra, Atchadé, et al. 2011. Iterated Filtering.” The Annals of Statistics.
Ionides, E. L., Bretó, and King. 2006. Inference for Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences.
Ionides, Edward L., Nguyen, Atchadé, et al. 2015. Inference for Dynamic and Latent Variable Models via Iterated, Perturbed Bayes Maps.” Proceedings of the National Academy of Sciences.
Künsch. 2013. Particle Filters.” Bernoulli.
Lele, S. R., Dennis, and Lutscher. 2007. Data Cloning: Easy Maximum Likelihood Estimation for Complex Ecological Models Using Bayesian Markov Chain Monte Carlo Methods. Ecology Letters.
Lele, Subhash R., Nadeem, and Schmuland. 2010. Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning.” Journal of the American Statistical Association.
Lindström, Ionides, Frydendall, et al. 2012. Efficient Iterated Filtering.” In IFAC-PapersOnLine (System Identification, Volume 16). 16th IFAC Symposium on System Identification.
Li, Schwab, Antholzer, et al. 2020. NETT: Solving Inverse Problems with Deep Neural Networks.” Inverse Problems.
Liu, and West. 2001. Combined Parameter and State Estimation in Simulation-Based Filtering.” In Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science.
Ljung. 1999. System Identification: Theory for the User. Prentice Hall Information and System Sciences Series.
Ljung, and Söderström. 1983. Theory and Practice of Recursive Identification. The MIT Press Series in Signal Processing, Optimization, and Control 4.
Menon, and Ong. 2016. Linking Losses for Density Ratio and Class-Probability Estimation.” In Proceedings of The 33rd International Conference on Machine Learning.
Møller, and Rasmussen. 2005. Perfect Simulation of Hawkes Processes.” Advances in Applied Probability.
———. 2006. Approximate Simulation of Hawkes Processes.” Methodology and Computing in Applied Probability.
Nielsen. 2018. An Elementary Introduction to Information Geometry.” arXiv:1808.08271 [Cs, Math, Stat].
Nock, Menon, and Ong. 2016. A Scaled Bregman Theorem with Applications.” arXiv:1607.00360 [Cs, Stat].
Oakes. 1975. The Markovian Self-Exciting Process.” Journal of Applied Probability.
Papavasiliou, and Taylor. 2016. Approximate Likelihood Construction for Rough Differential Equations.” arXiv:1612.02536 [Math, Stat].
Pardoux, and Samegni-Kepgnou. 2017. Large Deviation Principle for Epidemic Models.” Journal of Applied Probability.
Rasmussen, Jakob G. 2011. Temporal Point Processes the Conditional Intensity Function.”
Rasmussen, Jakob Gulddahl. 2013. Bayesian Inference for Hawkes Processes.” Methodology and Computing in Applied Probability.
Reid, and Williamson. 2011. Information, Divergence and Risk for Binary Experiments.” Journal of Machine Learning Research.
Rizoiu, Xie, Sanner, et al. 2017. Expecting to Be HIP: Hawkes Intensity Processes for Social Media Popularity.” In World Wide Web 2017, International Conference on. WWW ’17.
Segall, Davis, and Kailath. 1975. Nonlinear Filtering with Counting Observations.” IEEE Transactions on Information Theory.
Singh, and Gordon. 2008. A Unified View of Matrix Factorization Models.” In Machine Learning and Knowledge Discovery in Databases.
Sra, and Dhillon. 2006. Generalized Nonnegative Matrix Approximations with Bregman Divergences.” In Advances in Neural Information Processing Systems 18.
Wibisono, Wilson, and Jordan. 2016. A Variational Perspective on Accelerated Methods in Optimization.” Proceedings of the National Academy of Sciences.
Yin, Osher, Goldfarb, et al. 2008. Bregman Iterative Algorithms for \(\ell_1\)-Minimization with Applications to Compressed Sensing.” SIAM Journal on Imaging Sciences.
Zhang. 2013. Bregman Divergence and Mirror Descent.”