# Beta and Dirichlet distributions

October 14, 2019 — April 4, 2022

classification
Lévy processes
probability
stochastic processes
time series

$\renewcommand{\var}{\operatorname{Var}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}$

Suppose the joint pdf of $$\rv{d}_{1}, \ldots, \rv{d}_{k-1}$$ is \begin{aligned} f\left(y_{1}, \ldots, y_{k-1}\right) &=\frac{\alpha_{1}+\cdots+\alpha_{k}}{\Gamma\left(\alpha_{1}\right) \cdots \Gamma\left(\alpha_{k}\right)} y_{1}^{\alpha_{1}-1} \cdots y_{k-1}^{\alpha_{k-1}-1}\left(1-y_{1}-\cdots-y_{k-1}\right)^{\alpha_{k}-1},\\ &=\frac{\Gamma(\alpha)}{\prod_{i=1}^k\Gamma(\alpha_i)}\prod_{i=1}^k y_i^{\alpha_i-1} \end{aligned} where $$y_{i}>0, y_{1}+\cdots+y_{k-1}<1, i=1, \ldots, k-1$$ and $$\alpha=\sum_i\alpha_i$$. Then the random variables $$\rv{d}_{1}, \ldots, \rv{d}_{k-1}$$ are distributed with the Dirichlet distribution with parameters $$\alpha_{1}, \ldots, \alpha_{k}$$. Usually I write this as a vector random variate, with a vector parameters, rather than a long list, $\vrv{d}\sim\operatorname{Dirichlet}(\vv{\alpha}).$

The Beta distribution is the special case of the Dirichlet distribution with parameters $$\vv{\alpha}=[\alpha_1,\alpha_2]$$, i.e. the bivariate case.

There is more information in Wikipedia, although these pages are IMO unusually uninspired and confusing. My prose is terrible because I rarely have time to revisit it. What is Wikipedia’s excuse?

TBD.

TBD.

## 3 Beta as exponential family

Beta distribution: $$Y \sim \operatorname{Beta}(\alpha, \beta)$$ \begin{aligned} f_{Y}(y \mid \alpha, \beta)=& \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} y^{\alpha-1}(1-y)^{\beta-1} \\ =& {[y(1-y)]^{-1} \exp (\alpha \log (y)+\beta \log (1-y)} \\ +&\log \Gamma(\alpha+\beta)-\log \Gamma(\alpha)-\log \Gamma(\beta)) \end{aligned} with \begin{aligned} \eta(\alpha, \beta) &=(\alpha, \beta)^{\top} \\ T(y) &=(\log (y), \log (1-y))^{\top}. \end{aligned}

## 4 Dirichlet as exponential family

The Dirichlet distribution is an exponential family and can be written in canonical form as $\operatorname{Dirichlet}(\boldsymbol{\theta} \mid \boldsymbol{\alpha})=f(\boldsymbol{\theta}) g(\boldsymbol{\alpha}) e^{\phi(\boldsymbol{\alpha})^{T} u(\boldsymbol{\theta})}$ with $f(\boldsymbol{\theta})=1, g(\boldsymbol{\alpha})=1 / B(\boldsymbol{\alpha})$ where $B(\boldsymbol{\alpha})=\prod_{t=1}^{D} \Gamma\left(\alpha_{t}\right) / \Gamma\left(\sum_{t=1}^{D} \alpha_{t}\right), \phi(\boldsymbol{\alpha})=\left(\begin{array}{c} \alpha_{1}-1 \\ \vdots \\ \alpha_{D}-1 \end{array}\right)$ and $u(\boldsymbol{\theta})=\left(\begin{array}{c} \ln \theta_{1} \\ \vdots \\ \ln \theta_{D} \end{array}\right)$

## 5 Conjugate prior for Dirichlet RVs

Lefkimmiatis, Maragos, and Papandreou (2009) argue:

Since for any member of the exponential family there exists a conjugate prior that can be written in the form $p(\boldsymbol{\alpha} \mid \mathbf{v}, \eta) \propto g(\boldsymbol{\alpha})^{\eta} e^{\phi(\boldsymbol{\alpha})^{T} \mathbf{v}}$ a suitable conjugate prior distribution for the parameters $$\boldsymbol{\alpha}$$ of the Dirichlet is $p(\boldsymbol{\alpha} \mid \mathbf{v}, \eta) \propto \frac{1}{B(\boldsymbol{\alpha})^{\eta}} e^{-\sum_{t=1}^{D} v_{t} \alpha_{t}}$

Wikipedia claims that there is no efficient means for sampling from this, which is sad for MCMC. Generally this does not bother people because we rarely observe Dirichlet RVs directly; they are usually, e.g. a mixing probability for some other distribution.

## 6 Non-conjugate priors

Anything that can be transformed to be an elementwise positive vector, presumably. multivariate gamma seems natural.

## 7 References

Devroye. 1986. Non-uniform random variate generation.
Dufresne. 1998. Advances in Applied Mathematics.
Lefkimmiatis, Maragos, and Papandreou. 2009. IEEE Transactions on Image Processing.
Lin. 2016. “On The Dirichlet Distribution.”
Pérez-Abreu, and Stelzer. 2014. Journal of Multivariate Analysis.
Yor. 2007. In Advances in Mathematical Finance. Applied and Numerical Harmonic Analysis.