Exponential families



Assumed audience:

Data scientists who must pretend they can remember statistics

Exponential families can be handled compactly

Exponential families! The secret magic at the heart of traditional statistics.

Exponential families are probability distributions that just work, in the sense that and the things we would hope we can do with them, we can. Thus these are the distributions we are taught to handle in statistics classes, and which lead us to undue optimism about statistics more generally, all of which falls apart later. Often, though, we can approximate intractable families by exponential ones or cunning combinations thereof, e.g. in variational inference, so this is not a complete waste of time.

Natural exponential families

a.k.a. NEFs. The simplest case. Suppose that \(\mathbf{x} \in \mathcal {X} \subseteq \mathbb{R} ^{p}.\) Then, a natural exponential family of order p has density or mass function of the form: \[ {\displaystyle f_{X}(\mathbf {x} \mid {\boldsymbol {\theta }})=h(\mathbf {x} )\ \exp {\Big (}{\boldsymbol {\theta }}^{\rm {T}}\mathbf {x} -A({\boldsymbol {\theta }})\ {\Big )}\,\!,} \] where in this case the parameter \(\boldsymbol {\theta }\in \mathbb {R}^{p}.\)

Important members of this sub-family: Gamma, Gaussian, negative binomial, Poisson and binomial.

I mention this family first because it is a good intuition pump. More commonly we consider a more general family.

(Full-blown) exponential families

In the more general case we allow the natural statistics and the parameters to not be in natural form but rather related by some \(\mathbb{R} ^{p}\to\mathbb{R} ^{p}\) functions \(T\) and \(\eta.\) The non-trivial part is the \(T\) function โ€” we can always redefine the \(\eta(\theta)\) to tbe the real parameters rather than \(\theta\) and in fact we frequently do, calling it the canonical parameterisation. \[ \displaystyle f_{X}\!\left(\,\mathbf {x} \mid {\boldsymbol {\theta }}\,\right)=h(\mathbf {x} )\,\exp \!{\Big (}\,{\boldsymbol {\eta }}({\boldsymbol {\theta }})\cdot \mathbf {T} (\mathbf {x} )-A({\boldsymbol {\theta }})\,{\Big )}. \] I.e. these are nonlinear transformation of NEFs. We call \(\eta\) the natural parameter, and \(\mathbf {T}\) the sufficient statistic, and \(A\) the log-partition function.

Natural parameters and sufficient statistics

One of the neat things about the exponential families is that the partition function, natural statistics and natural parameters are informative about each other.

The cumulant-generating function is simply \(K(u|\eta)=A(\eta+u)-A(\eta)\).

For the natural exponential families, \(T\) and \(\eta\) are identities, the mean vector and covariance matrix are \[ \operatorname {E} [X]=\nabla A({\boldsymbol {\theta }}){\text{ and }}\operatorname {Cov} [X]=\nabla \nabla ^{\rm {T}}A({\boldsymbol {\theta }})\] where \(\nabla\) is the gradient and \(\nabla \nabla ^{\top}\) is the Hessian matrix.

Natural exponential families with quadratic variance functions

A special case with even nicer properties (Morris 1982, 1983; Morris and Lock 2009).

Morris (1982):

The normal, Poisson, gamma, binomial, and negative binomial distributions are univariate natural exponential families with quadratic variance functions (the variance is at most a quadratic function of the mean). Only one other such family exists. Much theory is unified for these six natural exponential families by appeal to their quadratic variance property, including infinite divisibility, cumulants, orthogonal polynomials, large deviations, and limits in distribution.

Conjugate priors

TBD. Diaconis and Ylvisaker (1979)

PCA

PCA is famous for Gaussian data. I gather there is some sense in which it can be generalised to all exponential families as the Exponential Family PCA (Collins, Dasgupta, and Schapire 2001; Jun Li and Dacheng Tao 2013; Liu, Dobriban, and Singer 2017; Mohamed, Ghahramani, and Heller 2008).

For random graphs

Exponential random graph models. TBD

In graphical models

See message passing and conjugacy.

Curved exponential families

A generalisation I occasionally see is that of curved exponential families. I do not know how these work of if they have enough features to benefit me.

References

Altun, Yasemin, Alex J. Smola, and Thomas Hofmann. 2004. โ€œExponential Families for Conditional Random Fields.โ€ In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2โ€“9. UAI โ€™04. Arlington, Virginia, United States: AUAI Press.
Balkema, A. A., and L. de Haan. 1974. โ€œResidual Life Time at Great Age.โ€ The Annals of Probability 2 (5): 792โ€“804.
Brown, Lawrence D. 1986. Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory. Lecture Notes-Monograph Series, v. 9. Hayward, Calif: Institute of Mathematical Statistics.
Brown, Lawrence D., T. Tony Cai, and Harrison H. Zhou. 2010. โ€œNonparametric Regression in Exponential Families.โ€ The Annals of Statistics 38 (4): 2005โ€“46.
Canu, Stรฉphane, and Alex Smola. 2006. โ€œKernel Methods and the Exponential Family.โ€ Neurocomputing 69 (7-9): 714โ€“20.
Charpentier, Arthur, and Emmanuel Flachaire. 2019. โ€œPareto Models for Risk Management.โ€ arXiv:1912.11736 [Econ, Stat], December.
Collins, Michael, S. Dasgupta, and Robert E Schapire. 2001. โ€œA Generalization of Principal Components Analysis to the Exponential Family.โ€ In Advances in Neural Information Processing Systems. Vol. 14. MIT Press.
Diaconis, Persi, and Donald Ylvisaker. 1979. โ€œConjugate Priors for Exponential Families.โ€ The Annals of Statistics 7 (2): 269โ€“81.
Efron, Bradley. 1978. โ€œThe Geometry of Exponential Families.โ€ The Annals of Statistics 6 (2): 362โ€“76.
Fink, Daniel. 1997. โ€œA Compendium of Conjugate Priors,โ€ 46.
Fisher, R. A., and L. H. C. Tippett. 1928. โ€œLimiting Forms of the Frequency Distribution of the Largest or Smallest Member of a Sample.โ€ Mathematical Proceedings of the Cambridge Philosophical Society 24 (2): 180โ€“90.
Gurevich, Pavel, and Hannes Stuke. 2019. โ€œGradient Conjugate Priors and Multi-Layer Neural Networks.โ€ arXiv.
Jensen, Jens Ledet, and Jesper Mรธller. 1991. โ€œPseudolikelihood for Exponential Family Models of Spatial Point Processes.โ€ The Annals of Applied Probability 1 (3): 445โ€“61.
Jun Li, and Dacheng Tao. 2013. โ€œSimple Exponential Family PCA.โ€ IEEE Transactions on Neural Networks and Learning Systems 24 (3): 485โ€“97.
Jung, Alexander, Sebastian Schmutzhard, and Franz Hlawatsch. 2012. โ€œThe RKHS Approach to Minimum Variance Estimation Revisited: Variance Bounds, Sufficient Statistics, and Exponential Families.โ€ arXiv:1210.6516 [Math, Stat], October.
Liu, Lydia T., Edgar Dobriban, and Amit Singer. 2017. โ€œ\(e\)PCA: High Dimensional Exponential Family PCA.โ€ arXiv.
Makarov, Mikhail. 2006. โ€œExtreme Value Theory and High Quantile Convergence.โ€ The Journal of Operational Risk 1 (2): 51โ€“57.
McNeil, Alexander J. 1997. โ€œEstimating the Tails of Loss Severity Distributions Using Extreme Value Theory.โ€ ASTIN Bulletin: The Journal of the IAA 27 (1): 117โ€“37.
Mohamed, Shakir, Zoubin Ghahramani, and Katherine A Heller. 2008. โ€œBayesian Exponential Family PCA.โ€ In Advances in Neural Information Processing Systems. Vol. 21. Curran Associates, Inc.
Morris, Carl N. 1982. โ€œNatural Exponential Families with Quadratic Variance Functions.โ€ The Annals of Statistics 10 (1): 65โ€“80.
โ€”โ€”โ€”. 1983. โ€œNatural Exponential Families with Quadratic Variance Functions: Statistical Theory.โ€ The Annals of Statistics 11 (2): 515โ€“29.
Morris, Carl N., and Kari F. Lock. 2009. โ€œUnifying the Named Natural Exponential Families and Their Relatives.โ€ The American Statistician 63 (3): 247โ€“53.
Mueller, Ulrich K. 2018. โ€œRefining the Central Limit Theorem Approximation via Extreme Value Theory.โ€ arXiv:1802.00762 [Math], February.
Pickands III, James. 1975. โ€œStatistical Inference Using Extreme Order Statistics.โ€ The Annals of Statistics 3 (1): 119โ€“31.
Ranganath, Rajesh, Linpeng Tang, Laurent Charlin, and David Blei. 2015. โ€œDeep Exponential Families.โ€ In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 762โ€“71. PMLR.
Seeger, Matthias, ed. 2005. โ€œExpectation Propagation for Exponential Families.โ€
Shen, Xiaotong, Hsin-Cheng Huang, and Jimmy Ye. 2004. โ€œAdaptive Model Selection and Assessment for Exponential Family Distributions.โ€ Technometrics 46 (3): 306โ€“17.
Tansey, Wesley, Oscar Hernan Madrid Padilla, Arun Sai Suggala, and Pradeep Ravikumar. 2015. โ€œVector-Space Markov Random Fields via Exponential Families.โ€ In Journal of Machine Learning Research, 684โ€“92.
Tojo, Koichi, and Taro Yoshino. 2019. โ€œA Method to Construct Exponential Families by Representation Theory.โ€ arXiv:1811.01394 [Cs, Math, Stat], August.
Vajda, S. 1951. โ€œAnalytical Studies in Stop-Loss Reinsurance.โ€ Scandinavian Actuarial Journal 1951 (1-2): 158โ€“75.
Wainwright, Martin J., and Michael I. Jordan. 2008. Graphical Models, Exponential Families, and Variational Inference. Vol. 1. Foundations and Trendsยฎ in Machine Learning. Now Publishers.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.