Exponential families! The secret magic at the heart of traditional statistics.
Exponential families are probability distributions that just work, in the sense that the things we would hope we can do with them, we can. Informally, this is because a lot of the stuff we do in statistics is about multiplying probabilities, and exponential families are distributions that capture “easy-to-multiply” probabilities. Thus these are the distributions we are taught to handle in statistics classes, and which lead us to undue optimism about statistics more generally, all of which falls apart later. Often, though, we can approximate intractable families by exponential ones or cunning combinations thereof, e.g. in variational inference, so this is not a complete waste of time.
1 Background
Michael I. Jordan, why not?
2 Natural exponential families
a.k.a. NEFs. The simplest case. Suppose that
Important members of this sub-family: Gamma with known shape, Gaussian with known variance, negative binomial with known
I mention this family first because it is a good intuition pump. The only problem that it has usefully solved for me so far is Gaussian Belief Propagation. That is, however, a very important case.
3 (Full-blown) exponential families
More commonly we consider the general exponential family, which allows the natural statistics and the parameters to not be in natural form to interact via some
4 Cumulant generating function
TBC.
For the natural exponential families,
5 Natural parameters and sufficient statistics
One of the neat things about the exponential families is that the partition function, natural statistics and natural parameters are informative about each other.
The cumulant-generating function is simply
6 Natural exponential families with quadratic variance functions
A special case with nice properties (Morris 1982, 1983; Morris and Lock 2009).
Morris (1982):
The normal, Poisson, gamma, binomial, and negative binomial distributions are univariate natural exponential families with quadratic variance functions (the variance is at most a quadratic function of the mean). Only one other such family exists. Much theory is unified for these six natural exponential families by appeal to their quadratic variance property, including infinite divisibility, cumulants, orthogonal polynomials, large deviations, and limits in distribution.
7 Conjugate priors
A useful feature of exponential families is that they have conjugate priors, which means that the posterior distribution is in the same family as the prior, and moreover, there is a simple formula for updating the parameters. This is deliciously easy, and also misleads one into thinking that Bayes inference is much easier than it actually is in the general case. See conjugate priors.
8 PCA
PCA is famous for Gaussian data. I gather there is some sense in which it can be generalised to all exponential families as the Exponential Family PCA (Collins, Dasgupta, and Schapire 2001; Jun Li and Dacheng Tao 2013; Liu, Dobriban, and Singer 2017; Mohamed, Ghahramani, and Heller 2008).
9 For random graphs
Exponential random graph models. TBD
10 In graphical models
11 Curved exponential families
A generalisation I occasionally see is that of curved exponential families. I do not know how these work or if they have enough features to benefit me.
12 Squared Neural Families
Another generalisation. See squared neural families.
13 Tempered
Call the
AFAICT we cannot say more about the normalising constant without knowing more about the form of