Gamma distributions

2019-10-14 — 2024-07-28

Lévy processes

probability

stochastic processes

time series

Suspiciously similar content

Figure 1: The gammer distribution is a popular model for survival times.

The density $g (x | α, λ)$ of the univariate Gamma is $g (x | α, λ) = \frac{λ^{α}}{Γ (α)} x^{α - 1} e^{- λ x}, x \geq 0.$ People refer to this as the shape-rate parameterisation, with rate $λ$ and shape $α$ . I find it easier to remember as $\log g (x | α, λ) = α \log λ + (α - 1) \log x - λ x - \log Γ (α)$

If $x \sim Gamma (α, λ)$ then $E (x) = α / λ$ and $Var (x) = α / λ^{2} .$

The Gamma distribution has lots of neat properties, such as divisibility. Some more are outlined in the Gamma-Dirichlet algebra section.

1 All moments

The moment generating function of the Gamma distribution is $E [\exp (x s)] = {(1 - \frac{s}{λ})}^{- α} for s < λ$ which gives us expressions for all moments fairly easily: $\begin{aligned} E [x] & = {\frac{d}{d s} E [\exp (x t)] |}_{s = 0} \\ = {\frac{d}{d s} {(1 - \frac{s}{λ})}^{- α} |}_{s = 0} \\ = {\frac{α}{λ} {(1 - \frac{s}{λ})}^{- α - 1} |}_{s = 0} \\ = α / λ \\ E [x^{2}] & = {\frac{d}{d s} \frac{α}{λ} {(1 - \frac{s}{λ})}^{- α - 1} |}_{s = 0} \\ = {\frac{α^{2} + α}{λ^{2}} {(1 - \frac{s}{λ})}^{- α - 2} |}_{s = 0} \\ = \frac{α (α + 1)}{λ^{2}} \\ Var [x] & = \frac{α}{λ^{2}} (handy later) \\ Var [x^{2}] & = \frac{Γ (α) Γ (α + 4) - Γ^{2} (α + 2)}{Γ^{2} (α) λ^{4}} (handy later) \\ E [x^{3}] & = {\frac{d}{d s} \frac{α^{2} + α}{λ^{2}} {(1 - \frac{s}{λ})}^{- α - 2} |}_{s = 0} \\ = \frac{α (α + 1) (α + 2)}{λ^{3}} {(1 - \frac{s}{λ})}^{- α - 2} \\ = \frac{α (α + 1) (α + 2)}{λ^{3}} \\ E [x^{4}] & = \frac{α (α + 1) (α + 2) (α + 3)}{λ^{4}} \\ \dots \\ E [x^{n}] & = \frac{⟨ α ⟩_{n}}{λ^{n}} \end{aligned}$

Here $⟨ α ⟩_{n} := \frac{Γ (α + n)}{Γ (α)}$ is the rising factorial.

2 As exponential family

The gamma distribution is a two-parameter exponential family with natural parameters $[θ_{1}, θ_{2}] = [α - 1, - λ]^{⊤}$ , and natural statistics $[x, \ln (x)]^{⊤} .$ In particular, it is a natural exponential family with quadratic variance function (Morris 1983).

The log-partition function $A (θ)$ is given by: $A (θ) = - (θ_{1} + 1) \log (- θ_{2}) + \log Γ (θ_{1} + 1)$ So in canonical form the density is $f (x; θ_{1}, θ_{2}) = \frac{1}{x} \exp (θ_{1} \log (x) + θ_{2} x - A (θ)) .$

In terms of the natural parameters, the mean and variance are $\begin{aligned} E [x] & = \frac{θ_{1} + 1}{- θ_{2}} \\ Var [x] & = \frac{θ_{1} + 1}{θ_{2}^{2}} . \end{aligned}$

2.1 Tempered

A $τ$ -tempering version of the density behaves how we would hope, as an exponential family. Lazily writing $f^{τ}$ for the tempered density (which the pedantic might argue skips the normalising constant), we have $\begin{aligned} f {(x; θ_{1}, θ_{2})}^{τ} & = (\frac{1}{x} \exp {(θ_{1} \log (x) + θ_{2} x - A (θ))}^{τ}) \\ \propto (\frac{1}{x} \exp (τ θ_{1} \log (x) + τ θ_{2} x)) \\ = f (x; τ θ_{1}, τ θ_{2}) . \end{aligned}$ No surprises there. How this acts on the moments is different than e.g. Gaussians. $\begin{aligned} E_{x \sim f^{τ}} [x] & = \frac{τ θ_{1} + 1}{- τ θ_{2}} \\ = \frac{θ_{1}}{- θ_{2}} + \frac{1}{- τ θ_{2}} \\ = E_{x \sim f} [x] - \frac{1}{τ θ_{2}} \\ {Var}_{x \sim f^{τ}} [x^{τ}] & = \frac{τ θ_{1} + 1}{τ^{2} θ_{2}^{2}} \\ = \frac{1}{τ} \frac{τ θ_{1} + 1}{τ θ_{2}^{2}} \\ = \frac{1}{τ} E_{x \sim f^{τ}} [x] . \end{aligned}$

So if we weight the density down by $τ < 1$ , the mean decreases as the relative variance increases.

3 Linear combinations of Gammas

Is the Gamma family closed under addition? For fixed scale/rate parameters, yes. See Gamma-Dirichlet algebra.

If we are summing gamma variates which differ in the rate parameter $λ$ , the result is not in general Gamma distributed. It can be compactly expressed in terms of Gamma densities, for what that is worth. If we wish to know the distribution of the sum of a set of arbitrary Gamma random variables with different rate parameters, we can try to approximate the sum with moment matching (Mathai 1982; Moschopoulos 1985) if that feels worthwhile.

Note that multiplying a gamma RV by a scalar changes the rate, so gamma variates are not closed under affine combination as Gaussians ones are. The moral is that we cannot assume the convenient algebra of linear combination additivity as in the Gaussian process community. What kind of algebra do we get?

4 Gamma-Dirichlet algebra

There are various operations which give us similar conveniences, however. For those, we need to also be aware of the Dirichlet and Beta distributions. Here are some useful properties, drawn from, or extrapolated from, Dufresne (1998), Lin (2016), and Pérez-Abreu and Stelzer (2014) which use those properties.

First, we fix some notation. From here on, all variables denoted $x_{a}$ (for some $a > 0$ , with or without superscript) have a $Gamma (a, 1)$ distribution; all variables denoted $b_{a, b}$ (for some $a, b > 0$ , with or without superscript) have a ${Beta}_{a, b}$ distribution. In all expressions, the variables $x_{a_{1}}, x_{a_{2}}^{'}, \dots, b_{a_{1}, b_{1}}, b_{a_{2}, b_{2}}^{'}, \dots$ are independent unless I say otherwise.

4.1 Superposition

$x_{α_{1}} + x_{α_{2}} \sim Gamma (α_{1} + α_{2}, λ)$

4.2 Multiplication

If $x \sim Gamma (α, λ)$ then $c x \sim Gamma (α, λ / c) .$ This looks useful but in practice few constructions I handle vary $λ$ .

4.3 Beta Thinning

$\frac{x_{α_{1}}}{x_{α_{1}} + x_{α_{2}}} \sim Beta (α_{1}, α_{2})$ independent of $x_{α_{1}} + x_{α_{2}} .$ Equivalently, $x_{α_{1} + α_{2}} b_{α_{1}, α_{2}} \sim Gamma (α_{1})$ independent of $b_{α_{1}, α_{2}} .$

The Gamma-bridge construction arises from this thinning procedure.

4.4 Dirichlet thinning

Grab a set of independent Gamma rvs, ${x_{a_{i}}}_{i = 1, \dots, k}$ , and define $s = \sum_{i} x_{a_{i}} .$ We know that $s \sim Gamma (\sum_{i} a_{i}, 1) .$ But wait! There is more. Define $d_{i} = \frac{x_{a_{i}}}{s} .$ The $d_{i} \sim Dirichlet (α)$ , independently of $s .$

Conversely, take some arbitrary $x_{A}$ and some $d \sim Dirichlet (a)$ with $\sum_{i} a_{i} = B .$ Then $x_{A} d_{i} \sim Gamma (\frac{A a_{i}}{B}, λ),$ and also the random variates $x_{A} d_{i}, i = 1, \dots, k$ are jointly independent.

4.5 Beta thickening

Grab a set of independent Gamma rvs, ${x_{a_{i}}}_{i = 1, \dots, k} .$ If we take a set of Beta rvs with arbitrary dependence, ${b_{κ_{i}, a_{i} - κ_{i}}}_{i = 1, \dots, k}$ then the product rvs ${(x_{a_{i}} b_{κ_{i}, a_{i} - κ_{i}}) \sim Gamma (κ_{i}, λ)}_{i = 1, \dots, k}$ are jointly independent Gamma variates. Thus $\sum_{i} x_{a_{i}} b_{κ_{i}, a_{i} - κ_{i}} \sim Gamma (\sum_{i} κ_{i}, λ) .$ As a special case, if $κ_{i} \equiv κ,$ then $\sum_{i} x_{a_{i}} b_{κ_{i}, a_{i} - κ_{i}} \sim Gamma (\sum_{i} κ_{i}, λ) .$

TODO: Check this. Also, is it actually useful? I thought it was for coupling Gamma processes, but it turned out not to be necessary in my construction.

4.6 Other

There are many other nice properties and relations.

The properties I include in this section fail to define a formal algebraic structure, but they do define a bunch of operations that preserve membership of a certain distributional family, or pretty close to. 差不多. We can define the sets and operations if we really need an algebra.

Thematically, the operations that arise most often in this Gamma-“algebra” are not quite the same as in the Gaussian process “algebra”. In that case we are usually concerned with linear algebras in that many linear operations on many objects which are Gaussian in a very broad sense still end up being Gaussian and possessed of a closed-form solution. In this case we are mostly concerned with different operations, addition yes, but also thinning (Steutel and van Harn 2003) rather than multiplication.

Yor (2007) talks about the Gamma-Beta algebra of Dufresne (1998) which relates certain Markov chains of Gamma distribution and Beta distributions. Dufresne (1998)’s construction is a formal algebra, although one that I only pull a couple of trivial cases from. Read that paper for more than the following taster:

For any $w, x, y, z > 0$ , $b_{w, x} x_{y} + x_{z}^{'} \overset{d}{=} x_{y + z}^{''} (1 - b_{x, w} b_{y, z}^{'}) .$ In particular, for any $w, x, y > 0$ , $b_{w, x + y} x_{x} + x_{y}^{'} = x_{x + y} b_{w + y, x} \overset{d}{=} x_{w + y} x_{x + y, w} .$

For more, see the Gamma-Beta notebook.

5 Conjugate prior for

Fink (1997) summarises Miller (1980), which extends Damsleth (1975):

Suppose that data $x_{1}, \dots, x_{n}$ are independent and identically distributed from a gamma process where both the shape, $α$ , and the reciprocal scale, $β$ , parameters are unknown. The likelihood function, $L (α, β ∣ x_{1}, \dots x_{n})$ , proportional to the parameters, is $\begin{aligned} L (α, β ∣ x_{1}, \dots, x_{n}) \propto {\begin{array}{cc} \frac{P^{α - 1} \exp (- β S)}{{(Γ (α) β^{- α})}^{n}} & where x_{i} > 0, i = 1 \dots n \\ 0 & otherwise \end{array} \\ where S = \sum_{i = 1}^{n} x_{i} P = \prod_{i = 1}^{n} x_{i} . \end{aligned}$

The sufficient statistics are $n$ , the number of data points, $P$ , the product of the data, and $S$ , the sum of the data. The factors of [the equation] proportional to parameters $α$ and $β$ make up the kernel of the conjugate prior, $π (\cdot, \cdot)$ . We specify the conjugate prior with hyperparameters $p, q, r, s > 0$ $π (α, β ∣ p, q, r, s) = {\begin{array}{cc} \frac{1}{K} \frac{p^{α - 1} \exp (- β q)}{Γ (α)^{r} β^{- α s}} & where α, β > 0 \\ 0 & otherwise. \end{array}$

[…] The posterior joint distribution of $α$ and $β$ is specified by the hyperparameters $p^{'} = p P q^{'} = q + S r^{'} = r + n s^{'} = s + n .$

From this we find the predictive, $\begin{aligned} L (x | α, β) π (α, β ∣ p, q, r, s) d α d β \\ \propto \iint \frac{x^{α - 1} \exp (- β x)}{Γ (α) β^{- α}} \frac{p^{α - 1} \exp (- β q)}{Γ (α)^{r} β^{- α s}} d α d β \\ \propto \iint \frac{(x p)^{α - 1} \exp - β (x + q)}{Γ (α)^{r + 1} β^{- α (x + n)}} d α d β \end{aligned}$ The predictive is reasonably intractable e.g. because there is nothing obviously nice to do with the $1 / Γ (α)$ term; even estimating the mean requires approximations.

Maybe if you really want to have a good conjugate prior relation for a non-negative variate we should consider something a little (but only a little) less messy, such as [inverse Gaussian] (./inverse_gaussian_distribution.qmd) or lognormal distributions.

6 Generalized Gamma Convolution

As noted under divisible distributions, the class of Generalized Gamma Convolution (GGC) is a construction that represents some startling (to me) processes as a certain type of generalization of Gamma distributions. This family includes Pareto (Thorin 1977b) and Lognormal (Thorin 1977b) distributions. Those Thorin papers introduced the idea originally; possibly it is easy to start from one of the textbooks or overviews (Bondesson 2012; James, Roynette, and Yor 2008; Steutel and van Harn 2003; Barndorff-Nielsen, Maejima, and Sato 2006).

AFAICT this allows us to prove lots of nice things about such distributions. It is less easy to get implementable computational methods this way.

The GGC convolves a Gamma distribution with some measure and makes a new divisible distribution. James, Roynette, and Yor (2008):

we say that a positive r.v. $Γ$ is a generalized gamma convolution $(GGC)$ - … if there exists a positive Radon measure $μ$ on $] 0, \infty [$ such that: $\begin{aligned} E [e^{- λ Γ}] = & \exp {- \int_{0}^{\infty} (1 - e^{- λ x}) \frac{d x}{x} \int_{0}^{\infty} e^{- x z} μ (d z)} \\ = & \exp {- \int_{0}^{\infty} \log (1 + \frac{λ}{z}) μ (d z)} \\ with: & \int_{] 0, 1]} | \log x | μ (d x) < \infty and \int_{[1, \infty [} \frac{μ (d x)}{x} < \infty . \end{aligned}$

Barndorff-Nielsen, Maejima, and Sato (2006) and Pérez-Abreu and Stelzer (2014) generalize the GGC to vector- and matrix-valued distributions.

7 Parameter estimation

The method of moments is obvious. The Maximum likelihood version is surprisingly fiddly and has no closed form, but a low-bias closed-form approximation is given by Ye and Chen (2017). Wikipedia’s summary:

The estimate for the shape $k$ is

$\hat{k} = \frac{N \sum_{i = 1}^{N} x_{i}}{N \sum_{i = 1}^{N} x_{i} \ln x_{i} - \sum_{i = 1}^{N} x_{i} \sum_{i = 1}^{N} \ln x_{i}}$

and the estimate for the scale $θ$ is

$\hat{θ} = \frac{1}{N^{2}} (N \sum_{i = 1}^{N} x_{i} \ln x_{i} - \sum_{i = 1}^{N} x_{i} \sum_{i = 1}^{N} \ln x_{i})$

Using the sample mean of $x$ , the sample mean of $\ln x$ , and the sample mean of the product $x \ln x$ simplifies the expressions to:

$\hat{k} = \bar{x} / \hat{θ}$ $\hat{θ} = \overset{―}{x \ln x} - \bar{x} \overset{―}{\ln x} .$

If the rate parameterization is used, the estimate of $\hat{β} = 1 / \hat{θ}$ .

These estimators are not strictly maximum likelihood estimators, but are instead referred to as mixed type log-moment estimators. They have however similar efficiency as the maximum likelihood estimators.

Although these estimators are consistent, they have a small bias. A bias-corrected variant of the estimator for the scale $θ$ is

$\tilde{θ} = \frac{N}{N - 1} \hat{θ}$

A bias correction for the shape parameter $k$ is given as (Louzada, Ramos, and Ramos 2019)

$\tilde{k} = \hat{k} - \frac{1}{N} (3 \hat{k} - \frac{2}{3} (\frac{\hat{k}}{1 + \hat{k}}) - \frac{4}{5} \frac{\hat{k}}{(1 + \hat{k})^{2}})$

A Bayesian update is given with a similar form in Louzada and Ramos (2018) but they have a slightly different parameterisation, so I will need to come back to that when I have time to translate that.

8 Simulating Gamma variates

8.1 Univariate

A Gamma variate can be generated by many methods (Ahrens and Dieter 1974), e.g. a transformed normal and a uniform random variable (Ahrens and Dieter 1982), or two uniforms, depending on the parameter range. Most methods involve a rejection step. Here is Devroye (2006) summary for beta generators for $0 < a, b \leq 1$ and Gamma generators for $a < 1$ :

Johnk’s beta generator

REPEAT Generate iid uniform [0,1] random variates $u, v$

$\begin{matrix} x \leftarrow u^{\frac{1}{a}}, y - v^{\frac{1}{b}} \\ UNTL x + y \leq 1 \end{matrix}$

RETURN $\frac{x}{x + y}$

$(x is beta (a, b) distributed)$

Berman’s beta generator

REPEAT Generate iid uniform [0,1] random variates $u, v$

$\begin{matrix} x \leftarrow u^{\frac{1}{4}}, y \leftarrow v^{\frac{1}{b}} \\ UNTIL x + y \leq 1 \end{matrix}$

RETURN $x$

$(x is beta (a, b + 1) distributed)$

Johnk’s gamma generator

REPEAT Generate iid uniform [0,1] random variates $u, v$

$\begin{array}{r} x + u^{\frac{1}{a}}, y \leftarrow v^{\frac{1}{1 - a}} \\ UNTL x + y \leq 1 \end{array}$

Generate an exponential random variate $E$ RETURN $\frac{E x}{x + y}$

$(x is gamma (a$ ) distributed)

Berman’s gamma generator

REPEAT Generate iid uniform [0,1] random variates $u, v$

$\begin{array}{r} x \leftarrow u^{\frac{1}{4}}, y \leftarrow v^{\frac{1}{1 - a}} \\ UNTL x + y \leq 1 \end{array}$

Generate a gamma ( 2 ) random variate $z$ (either as the sum of two iid exponential random variates or as $- \log (u * v *)$ where $u *, v *$ are lid uniform [0,1] random variates

RETURN $z x$

$(x is gamma (a) distributed)$

9 References

Ahrens, and Dieter. 1974. “Computer Methods for Sampling from Gamma, Beta, Poisson and Bionomial Distributions.” Computing.

———. 1982. “Generating Gamma Variates by a Modified Rejection Technique.” Communications of the ACM.

Barndorff-Nielsen, Maejima, and Sato. 2006. “Some Classes of Multivariate Infinitely Divisible Distributions Admitting Stochastic Integral Representations.” Bernoulli.

Bondesson. 1979. “A General Result on Infinite Divisibility.” The Annals of Probability.

———. 2012. Generalized Gamma Convolutions and Related Classes of Distributions and Densities. Lecture Notes in Statistics 76.

Damsleth. 1975. “Conjugate Classes for Gamma Distributions.” Scandinavian Journal of Statistics.

Devroye. 1986. Non-uniform random variate generation.

———. 2006. “Nonuniform Random Variate Generation.” In Simulation. Handbooks in Operations Research and Management Science.

Dufresne. 1998. “Algebraic Properties of Beta and Gamma Distributions, and Applications.” Advances in Applied Mathematics.

Fink. 1997. “A Compendium of Conjugate Priors.”

James, Roynette, and Yor. 2008. “Generalized Gamma Convolutions, Dirichlet Means, Thorin Measures, with Explicit Examples.” Probability Surveys.

Lin. 2016. “On The Dirichlet Distribution.”

Louzada, and Ramos. 2018. “Efficient Closed-Form Maximum a Posteriori Estimators for the Gamma Distribution.” Journal of Statistical Computation and Simulation.

Louzada, Ramos, and Ramos. 2019. “A Note on Bias of Closed-Form Estimators for the Gamma Distribution Derived From Likelihood Equations.” The American Statistician.

Mathai. 1982. “Storage Capacity of a Dam with Gamma Type Inputs.” Annals of the Institute of Statistical Mathematics.

Miller. 1980. “Bayesian Analysis of the Two-Parameter Gamma Distribution.” Technometrics.

Morris. 1982. “Natural Exponential Families with Quadratic Variance Functions.” The Annals of Statistics.

———. 1983. “Natural Exponential Families with Quadratic Variance Functions: Statistical Theory.” The Annals of Statistics.

Morris, and Lock. 2009. “Unifying the Named Natural Exponential Families and Their Relatives.” The American Statistician.

Moschopoulos. 1985. “The Distribution of the Sum of Independent Gamma Random Variables.” Annals of the Institute of Statistical Mathematics.

Pérez-Abreu, and Stelzer. 2014. “Infinitely Divisible Multivariate and Matrix Gamma Distributions.” Journal of Multivariate Analysis.

Rezende, Mohamed, and Wierstra. 2015. “Stochastic Backpropagation and Approximate Inference in Deep Generative Models.” In Proceedings of ICML.

Roychowdhury, and Kulis. 2015. “Gamma Processes, Stick-Breaking, and Variational Inference.” In Artificial Intelligence and Statistics.

Steutel, and van Harn. 2003. Infinite Divisibility of Probability Distributions on the Real Line.

Thorin. 1977a. “On the Infinite Divisbility of the Pareto Distribution.” Scandinavian Actuarial Journal.

———. 1977b. “On the Infinite Divisibility of the Lognormal Distribution.” Scandinavian Actuarial Journal.

Ye, and Chen. 2017. “Closed-Form Estimators for the Gamma Distribution Derived From Likelihood Equations.” The American Statistician.

Yor. 2007. “Some Remarkable Properties of Gamma Processes.” In Advances in Mathematical Finance. Applied and Numerical Harmonic Analysis.