Beta and Dirichlet distributions

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]

Suppose the joint pdf of \(\rv{d}_{1}, \ldots, \rv{d}_{k-1}\) is \[\begin{aligned} f\left(y_{1}, \ldots, y_{k-1}\right) &=\frac{\alpha_{1}+\cdots+\alpha_{k}}{\Gamma\left(\alpha_{1}\right) \cdots \Gamma\left(\alpha_{k}\right)} y_{1}^{\alpha_{1}-1} \cdots y_{k-1}^{\alpha_{k-1}-1}\left(1-y_{1}-\cdots-y_{k-1}\right)^{\alpha_{k}-1},\\ &=\frac{\Gamma(\alpha)}{\prod_{i=1}^k\Gamma(\alpha_i)}\prod_{i=1}^k y_i^{\alpha_i-1} \end{aligned}\] where \(y_{i}>0, y_{1}+\cdots+y_{k-1}<1, i=1, \ldots, k-1\) and \(\alpha=\sum_i\alpha_i\). Then the random variables \(\rv{d}_{1}, \ldots, \rv{d}_{k-1}\) are distributed with the Dirichlet distribution with parameters \(\alpha_{1}, \ldots, \alpha_{k}\). Usually I write this as a vector random variate, with a vector parameters, rather than a long list, \[\vrv{d}\sim\operatorname{Dirichlet}(\vv{\alpha}).\]

The Beta distribution is the special case of the Dirichlet distribution with parameters \(\vv{\alpha}=[\alpha_1,\alpha_2]\), i.e. the bivariate case.

There is more information in Wikipedia, although these pages are IMO unusually uninspired and confusing. My prose is terrible because I rarely have time to revisit it. What is Wikipedia’s excuse?

A Beta RV is a ratio of Gamma RVS


A Dirichlet RV is a normalized sum of independent Gamma RVS


Beta as exponential family

Beta distribution: \(Y \sim \operatorname{Beta}(\alpha, \beta)\) \[ \begin{aligned} f_{Y}(y \mid \alpha, \beta)=& \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} y^{\alpha-1}(1-y)^{\beta-1} \\ =& {[y(1-y)]^{-1} \exp (\alpha \log (y)+\beta \log (1-y)} \\ +&\log \Gamma(\alpha+\beta)-\log \Gamma(\alpha)-\log \Gamma(\beta)) \end{aligned} \] with \[ \begin{aligned} \eta(\alpha, \beta) &=(\alpha, \beta)^{\top} \\ T(y) &=(\log (y), \log (1-y))^{\top}. \end{aligned} \]

Dirichlet as exponential family

The Dirichlet distribution is an exponential family and can be written in canonical form as \[ \operatorname{Dirichlet}(\boldsymbol{\theta} \mid \boldsymbol{\alpha})=f(\boldsymbol{\theta}) g(\boldsymbol{\alpha}) e^{\phi(\boldsymbol{\alpha})^{T} u(\boldsymbol{\theta})} \] with \[ f(\boldsymbol{\theta})=1, g(\boldsymbol{\alpha})=1 / B(\boldsymbol{\alpha}) \] where \[ B(\boldsymbol{\alpha})=\prod_{t=1}^{D} \Gamma\left(\alpha_{t}\right) / \Gamma\left(\sum_{t=1}^{D} \alpha_{t}\right), \phi(\boldsymbol{\alpha})=\left(\begin{array}{c} \alpha_{1}-1 \\ \vdots \\ \alpha_{D}-1 \end{array}\right) \] and \[ u(\boldsymbol{\theta})=\left(\begin{array}{c} \ln \theta_{1} \\ \vdots \\ \ln \theta_{D} \end{array}\right) \]

Conjugate prior for Dirichlet RVs

Lefkimmiatis, Maragos, and Papandreou (2009) argue:

Since for any member of the exponential family there exists a conjugate prior that can be written in the form \[ p(\boldsymbol{\alpha} \mid \mathbf{v}, \eta) \propto g(\boldsymbol{\alpha})^{\eta} e^{\phi(\boldsymbol{\alpha})^{T} \mathbf{v}} \] a suitable conjugate prior distribution for the parameters \(\boldsymbol{\alpha}\) of the Dirichlet is \[ p(\boldsymbol{\alpha} \mid \mathbf{v}, \eta) \propto \frac{1}{B(\boldsymbol{\alpha})^{\eta}} e^{-\sum_{t=1}^{D} v_{t} \alpha_{t}} \]

Wikipedia claims that there is no efficient means for sampling from this, which is sad for MCMC. Generally this does not bother people because we rarely observe Dirichlet RVs directly; they are usually, e.g. a mixing probability for some other distribution.

Non-conjugate priors

Anything that can be transformed to be an elementwise positive vector, presumably. multivariate gamma seems natural.


Devroye, Luc. 1986. Non-uniform random variate generation. New York: Springer.
Dufresne, Daniel. 1998. Algebraic Properties of Beta and Gamma Distributions, and Applications.” Advances in Applied Mathematics 20 (3): 285–99.
Lefkimmiatis, S., P. Maragos, and G. Papandreou. 2009. Bayesian Inference on Multiscale Models for Poisson Intensity Estimation: Applications to Photon-Limited Image Denoising.” IEEE Transactions on Image Processing 18 (8): 1724–41.
Lin, Jiayu. 2016. “On The Dirichlet Distribution,” 75.
Pérez-Abreu, Victor, and Robert Stelzer. 2014. Infinitely Divisible Multivariate and Matrix Gamma Distributions.” Journal of Multivariate Analysis 130 (September): 155–75.
Yor, Marc. 2007. Some Remarkable Properties of Gamma Processes.” In Advances in Mathematical Finance, edited by Michael C. Fu, Robert A. Jarrow, Ju-Yi J. Yen, and Robert J. Elliott, 37–47. Applied and Numerical Harmonic Analysis. Birkhäuser Boston.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.