Beta and Dirichlet distributions

2019-10-14 — 2022-04-04

Wherein the Beta is presented as a ratio of independent Gamma variates and the Dirichlet is exhibited as their normalized vector, parameters being tied to Gamma functions and total concentration.

classification

Lévy processes

probability

stochastic processes

time series

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]

Suppose the joint pdf of \(\rv{d}_{1}, \ldots, \rv{d}_{k-1}\) is \[\begin{aligned} f\left(y_{1}, \ldots, y_{k-1}\right) &=\frac{\alpha_{1}+\cdots+\alpha_{k}}{\Gamma\left(\alpha_{1}\right) \cdots \Gamma\left(\alpha_{k}\right)} y_{1}^{\alpha_{1}-1} \cdots y_{k-1}^{\alpha_{k-1}-1}\left(1-y_{1}-\cdots-y_{k-1}\right)^{\alpha_{k}-1},\\ &=\frac{\Gamma(\alpha)}{\prod_{i=1}^k\Gamma(\alpha_i)}\prod_{i=1}^k y_i^{\alpha_i-1} \end{aligned}\] where \(y_{i}>0, y_{1}+\cdots+y_{k-1}<1, i=1, \ldots, k-1\) and \(\alpha=\sum_i\alpha_i\). Then the random variables \(\rv{d}_{1}, \ldots, \rv{d}_{k-1}\) follow the Dirichlet distribution with parameters \(\alpha_{1}, \ldots, \alpha_{k}\). Usually, I write this as a vector random variate, with vector parameters, rather than a long list, \[\vrv{d}\sim\operatorname{Dirichlet}(\vv{\alpha}).\]

The Beta distribution is a special case of the Dirichlet distribution with parameters \(\vv{\alpha}=[\alpha_1,\alpha_2]\), i.e. the bivariate case.

There is more information in Wikipedia, although these pages are IMO unusually uninspired and confusing. My prose is terrible because I rarely have time to revisit it. What is Wikipedia’s excuse?

1 A Beta RV is a ratio of Gamma RVS

TBD.

2 A Dirichlet RV is a normalized sum of independent Gamma RVS

TBD.

3 Beta as exponential family

Beta distribution: \(Y \sim \operatorname{Beta}(\alpha, \beta)\) \[ \begin{aligned} f_{Y}(y \mid \alpha, \beta)=& \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} y^{\alpha-1}(1-y)^{\beta-1} \\ =& {[y(1-y)]^{-1} \exp (\alpha \log (y)+\beta \log (1-y)} \\ +&\log \Gamma(\alpha+\beta)-\log \Gamma(\alpha)-\log \Gamma(\beta)) \end{aligned} \] with \[ \begin{aligned} \eta(\alpha, \beta) &=(\alpha, \beta)^{\top} \\ T(y) &=(\log (y), \log (1-y))^{\top}. \end{aligned} \]

4 Dirichlet as exponential family

The Dirichlet distribution is an exponential family and can be written in canonical form as \[ \operatorname{Dirichlet}(\boldsymbol{\theta} \mid \boldsymbol{\alpha})=f(\boldsymbol{\theta}) g(\boldsymbol{\alpha}) e^{\phi(\boldsymbol{\alpha})^{T} u(\boldsymbol{\theta})} \] with \[ f(\boldsymbol{\theta})=1, g(\boldsymbol{\alpha})=1 / B(\boldsymbol{\alpha}) \] where \[ B(\boldsymbol{\alpha})=\prod_{t=1}^{D} \Gamma\left(\alpha_{t}\right) / \Gamma\left(\sum_{t=1}^{D} \alpha_{t}\right), \phi(\boldsymbol{\alpha})=\left(\begin{array}{c} \alpha_{1}-1 \\ \vdots \\ \alpha_{D}-1 \end{array}\right) \] and \[ u(\boldsymbol{\theta})=\left(\begin{array}{c} \ln \theta_{1} \\ \vdots \\ \ln \theta_{D} \end{array}\right) \]

5 Conjugate prior for Dirichlet RVs

Lefkimmiatis, Maragos, and Papandreou (2009) argue:

Since for any member of the exponential family there exists a conjugate prior that can be written in the form \[ p(\boldsymbol{\alpha} \mid \mathbf{v}, \eta) \propto g(\boldsymbol{\alpha})^{\eta} e^{\phi(\boldsymbol{\alpha})^{T} \mathbf{v}} \] a suitable conjugate prior distribution for the parameters \(\boldsymbol{\alpha}\) of the Dirichlet is \[ p(\boldsymbol{\alpha} \mid \mathbf{v}, \eta) \propto \frac{1}{B(\boldsymbol{\alpha})^{\eta}} e^{-\sum_{t=1}^{D} v_{t} \alpha_{t}} \]

Wikipedia claims that there is no efficient means for sampling from this, which is sad for MCMC. Generally this does not bother people because we rarely observe Dirichlet RVs directly; they are usually, e.g. a mixing probability for some other distribution.

6 Non-conjugate priors

Anything that can be transformed to be an elementwise positive vector, presumably. multivariate gamma seems natural.

7 References

Devroye. 1986. Non-uniform random variate generation.

Dufresne. 1998. “Algebraic Properties of Beta and Gamma Distributions, and Applications.” Advances in Applied Mathematics.

Lefkimmiatis, Maragos, and Papandreou. 2009. “Bayesian Inference on Multiscale Models for Poisson Intensity Estimation: Applications to Photon-Limited Image Denoising.” IEEE Transactions on Image Processing.

Lin. 2016. “On The Dirichlet Distribution.”

Pérez-Abreu, and Stelzer. 2014. “Infinitely Divisible Multivariate and Matrix Gamma Distributions.” Journal of Multivariate Analysis.

Yor. 2007. “Some Remarkable Properties of Gamma Processes.” In Advances in Mathematical Finance. Applied and Numerical Harmonic Analysis.