Matrix- and vector-valued generalizations of Gamma processes

$\renewcommand{\var}{\operatorname{Var}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}{\mathbb{#1}} \renewcommand{\vv}{\boldsymbol{#1}} \renewcommand{\rv}{\mathsf{#1}} \renewcommand{\vrv}{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\mm}{\mathrm{#1}} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}$

Processes that resemble Gamma processes but which take vector or matrix values. We start by considering trivial processes which have an empty index set, i.e. multivariate gamma distributions. So here is the simplest multivariate case:

Bivariate thinned Gamma RVs

There are many possible ways of generating two correlated gamma variates, but as far as I am concerned the default is the Beta thinning method.

For this and throughout, we use the fact that \begin{aligned} \rv{g}&\sim\operatorname{Gamma}(a,\lambda)\\ &\Rightarrow\\ \Ex[\rv{g}]&=\frac{a}{\lambda}\\ \var[\rv{g}]&=\frac{a}{\lambda^2}\\ \Ex[\rv{g}^2]&=\frac{a^2 +a}{\lambda^2}. \end{aligned}

Suppose we have $$\rv{g}_1\sim\operatorname{Gamma}(a_1+a_2,\lambda)$$ and $$\rv{b}\sim\operatorname{Beta}(a_1, a_2)$$ and $$\rv{g}'\sim\operatorname{Gamma}(a_2,\lambda)$$ with all rvs jointly independent. Now we generate $\rv{g}_2=\rv{b}\rv{g}_1+\rv{g}'.$ Then \begin{aligned}\Ex[\rv{g}_2] &=\Ex[\rv{b}\rv{g}_1]+\Ex[\rv{g}']\\ &=\frac{a_1+a_2}{\lambda} \end{aligned} and \begin{aligned}\Ex[\rv{g}_1\rv{g}_2] &=\Ex[\rv{g}_1(\rv{b}\rv{g}_1+\rv{g}')]\\ &=\Ex[\rv{g}_1\rv{b}\rv{g}_1+\rv{g}_1\rv{g}')]\\ &=\Ex[\rv{g}_1^2\rv{b}]+\Ex[\rv{g}_1\rv{g}')]\\ &=\Ex[\rv{g}_1^2]\Ex[\rv{b}]+\Ex[\rv{g}_1]\Ex[\rv{g}')]\\ &=\frac{(a_1+a_2)^2 + (a_1+a_2)}{\lambda^2}\frac{a_1}{a_1+a_2}+\frac{a_1+a_2}{\lambda}\frac{a_2}{\lambda}\\ &=\frac{a_1(a_1+a_2) + a_1}{\lambda^2}+\frac{a_2(a_1+a_2)}{\lambda^2}\\ &=\frac{a_1(a_1+a_2) + a_1+ a_2(a_1+a_2)}{\lambda^2}\\ &=\frac{a_1^2+2a_1a_2 + a_1+a_2^2}{\lambda^2} \end{aligned} and thus \begin{aligned}\var[\rv{g}_1,\rv{g}_2] &=\Ex[\rv{g}_1\rv{g}_2]-\Ex[\rv{g}_1]\Ex[\rv{g}_2]\\ &=\frac{a_1^2+2a_1a_2 + a_1+a_2^2-(a_1+a_2)^2}{\lambda^2}\\ &=\frac{a_1^2+2a_1a_2 + a_1+a_2^2-a_1^2-2a_1a_2-a_2^2}{\lambda^2}\\ &=\frac{a_1}{\lambda^2} \end{aligned} Thence \begin{aligned}\corr[\rv{g}_1,\rv{g}_2] &=\frac{\var[\rv{g}_1,\rv{g}_2]}{\sqrt{\var[\rv{g}_1]\var[\rv{g}_2]}}\\ &=\frac{\frac{a_1}{\lambda^2}}{\frac{a_1+a_2}{\lambda^2}}\\ &=\frac{a_1 }{a_1+a_2}. \end{aligned}

We can use this to generate more than two correlated gamma variates by successively thinning them, as long as we take care to remember that we can only add independent Gamma variates together to produce a new Gamma variate in general.

Multivariate latent thinned Gamma RVs

A different construction.

Suppose we have $$K$$ independent latent gamma variates. $\rv{\zeta}_k \sim \operatorname{Gamma}(A, \lambda), k=1,2,\dots,K.$ Let us also suppose we have a $$\mathbb{R}^{D\times K}$$ latent mixing matrix, $$\mm{R}=[r_{d,k}]$$ whose entries are all non-negative and whose rows sum to $$A.$$ Now, let us define new random variates $\rv{g}_d =\sum_{k=1}^K \rv{b}_{d,k}\rv{\zeta}_k, d=1,2,\dots,D.$ where $$\rv{b}_{d,k}\sim\operatorname{Beta}(r_{d,k},A-r_{d,k}).$$ By construction and the Beta thinning rule, we know that $\rv{g}_d \sim \operatorname{Gamma}(A, \lambda), d=1,2,\dots,D.$ What can we say about the correlation between them?

For this, we need the fact that \begin{aligned} \rv{b}&\sim\operatorname{Beta}(\alpha,\beta)\\ &\Rightarrow\\ \Ex[\rv{b}]&=\frac{\alpha}{\alpha+\beta}\\ \var[\rv{b}]&=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\\ \Ex[\rv{b}^2]&=\frac{\alpha^2(\alpha+\beta+1)+\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \end{aligned}

Expanding out the $$\rv{g}$$s, \begin{aligned}\Ex[\rv{g}_1\rv{g}_2] &=\Ex\left[\left(\sum_{k=1}^K \rv{b}_{1,k}\rv{\zeta}_k\right)\left( \sum_{k=1}^K \rv{b}_{2,k}\rv{\zeta}_k\right)\right]\\ &=\Ex\left[\sum_{k=1}^K \sum_{j=1}^K \rv{b}_{1,k}\rv{\zeta}_k\rv{b}_{2,j}\rv{\zeta}_j\right]\\ &=\sum_{k=1}^K \sum_{j=1}^K \Ex\left[\rv{b}_{1,k}\rv{\zeta}_k\rv{b}_{2,j}\rv{\zeta}_j\right]\\ &=\sum_{k=1}^K \Ex\left[\rv{b}_{1,k}\rv{b}_{2,k}\rv{\zeta}_k^2\right] + 2\sum_{k=1}^K \sum_{j=k+1}^K \Ex\left[\rv{b}_{1,k}\rv{\zeta}_k\rv{b}_{2,j}\rv{\zeta}_j\right]\\ &=\sum_{k=1}^K \Ex[\rv{b}_{1,k}]\Ex[\rv{b}_{2,k}]\Ex[\rv{\zeta}_k^2] + 2\sum_{k=1}^K \sum_{j=k+1}^K \Ex[\rv{b}_{1,k}]\Ex[\rv{\zeta}_k]\Ex[\rv{b}_{2,j}]\Ex[\rv{\zeta}_j]\\ &=\frac{A^2+A}{\lambda^2 A^2}\sum_{k=1}^Kr_{1,k}r_{2,k} + 2\sum_{k=1}^K \sum_{j=k+1}^K \frac{r_{1,k}}{A} \frac{A}{\lambda} \frac{r_{2,j}}{A} \frac{A}{\lambda} \\ &=\frac{A+1}{\lambda^2 A}\sum_{k=1}^Kr_{1,k}r_{2,k} + 2\sum_{k=1}^K \sum_{j=k+1}^K \frac{r_{1,k}r_{2,j}}{\lambda^2} \\ &=\frac{1}{\lambda^2}\left(\frac{A+1}{ A}\sum_{k=1}^Kr_{1,k}r_{2,k} + 2\sum_{k=1}^K \sum_{j=k+1}^K r_{1,k}r_{2,j} \right)\\ &=\frac{1}{\lambda^2}\left(\frac{1}{ A}\sum_{k=1}^Kr_{1,k}r_{2,k} + \sum_{k=1}^K \sum_{j=1}^K r_{1,k}r_{2,j} \right)\\ &=\frac{\frac1A \vv{r}_1\cdot\vv{r}_2+A^2}{\lambda^2} \end{aligned}

So \begin{aligned}\var[\rv{g}_1,\rv{g}_2] &=\Ex[\rv{g}_1\rv{g}_2]-\Ex\rv{g}_1\Ex\rv{g}_2\\ &=\frac{ \vv{r}_1\cdot\vv{r}_2}{A\lambda^2}. \end{aligned} and \begin{aligned}\corr[\rv{g}_1,\rv{g}_2] &=\frac{\var[\rv{g}_1,\rv{g}_2]}{\sqrt{\var[\rv{g}_1]\var[\rv{g}_2]}}\\ &=\frac{\frac{ \vv{r}_1\cdot\vv{r}_2}{A\lambda^2}}{\frac{A}{\lambda^2}}\\ &=\frac{ \vv{r}_1\cdot\vv{r}_2}{A^2}. \end{aligned}

After all that work, we find pretty much what we would have expected for the correlation of rvs constructed by weighting of latent Gaussian factors. Was there a shortcut that could have gotten us there quicker?

(For one, we could have equivalently constructed this in terms of samples from a $$K$$-dimensional Lévy-Gamma process, but that is another story.)

Wolpert’s spatial construction

By choosing different intersections of spatial support, different correlation is available in the spatial gamma processes of in R. Wolpert and Ickstadt (1998).

Random correlated weights

I haven’t actually checked that this works for gamma marginals, but it looks similar at least. Ranganath and Blei (2018) creates random measures that I think can induce Gamma weights by imposing a Gaussian correlation structure on the random weights of a random selection of Poisson atoms.

Kingman constructions

Possibly a unifying framework for the previous two.

General multivariate Gamma distribution How general can these be? Much effort has been spent on this question .

We use the convention that the Fourier transform $$\widehat{\mu}$$ of a measure $$\mu$$ on $$\mathbb{M}=\mathbb{R}^{d}$$ is given $\hat{\mu}(z)=\int_{\mathbb{M}} e^{i(z, x)} \mu(\mathrm{d} x) \quad z \in \mathbb{M}$

With this, Pérez-Abreu and Stelzer (2014) give a (maximally?) general multivariate Gamma distribution:

Let $$\mu$$ be an infinitely divisible probability distribution on $$\mathbb{R}^{d} .$$ If there exists a finite measure $$\alpha$$ on the unit sphere $$\mathbf{S}_{\mathbb{R}^{d},\|\cdot\|}$$ with respect to the norm $$\|\cdot\|$$ equipped with the Borel $$\sigma$$-algebra and a Borel-measurable function $$\lambda: \mathbf{S}_{\mathbb{R}^{d},\|\cdot\|} \rightarrow \mathbb{R}_{+}$$ such that the Fourier transform of the measure is given $\hat{\mu}(z)=\exp \left(\int_{\mathbf{S}_{\mathbb{R}^{d},\|\cdot\|}} \int_{\mathbb{R}_{+}}\left(e^{i r v^{\top} z}-1\right) \frac{e^{-\lambda(v) r}}{r} \mathrm{~d} r \alpha(\mathrm{d} v)\right)$ for all $$z \in \mathbb{R}^{d}$$, then $$\mu$$ is called a d-dimensional Gamma distribution with parameters $$\alpha$$ and $$\lambda$$, abbreviated $$\Gamma_{d}(\alpha, \lambda)$$ distribution. If $$\lambda$$ is constant, we call $$\mu$$ a $$\|\cdot\|$$-homogeneous $$\Gamma_{d}(\alpha, \lambda)$$-distribution.

Intriguingly the choice which norm $$\|\cdot\|$$ doesn’t matter much; changing the norm just perturbs $$\lambda$$ in a simple fashion.

It is not clear to me what families of $$\lambda$$s and $$alpha$$s are interesting from that description.

We do know which ones are admissible, at least:

Let $$\alpha$$ be a finite measure on $$\mathbf{S}_{\mathbb{R}^{d},\|\cdot\|}$$ and $$\beta: \mathbf{S}_{\mathbb{R}^{d},\|\cdot\|} \rightarrow \mathbb{R}_{+}$$ a measurable function. Then [the previous construction] defines a Lévy measure $$v_{\mu}$$ and thus there exists a $$\Gamma_{d}(\alpha, \beta)$$ probability distribution $$\mu$$ if and only if $\int_{\mathbf{S}_{\mathbb{R}}^{d},\|\cdot\|} \ln \left(1+\frac{1}{\beta(v)}\right) \alpha(\mathrm{d} v)<\infty .$ Moreover, $$\int_{\mathbb{R}^{d}}(\|x\| \wedge 1) v_{\mu}(\mathrm{d} x)<\infty$$ holds true. The condition os trivially satisfied, if $$\beta$$ is bounded away from zero $$\alpha$$-almost everywhere.

Something is weird about that integral though.

Vector Gamma process

How can we turn the a multivariate gamma distribution into a vector valued gamma process?

An associated Lévy process is easy. Are there any Ornstein-Uhlenbeck-type processes?

TBD

Wishart processes

Wishart distributions are commonly claimed to generalise Gamma processes, although AFAICT they are not so similar. “Wishart processes” are indeed a thing ; although the Wishart distribution is not a special case, AFAICT the same as a matrix Gamma distribution.

The common definition is not necessarily (element-wise, or in any other sense) monotonic; I am presuming we have stationary marginals? It generalises the square Bessel process, which is marginally $$\chi^2$$ distributed.

Inverse Wishart

Does the Inverse Wishart Process relate? TODO

HDP Matrix Gamma Process

Matrix-valued Lévy-Gamma process analogue. See , which uses the multivariate construction of Pérez-Abreu and Stelzer (2014). That construction is an extremely general, and somewhat abstract, and is easy to handle usually only through its Lévy measure.

AΓ Process

Meier, Kirch, and Meyer (2020) mentions a construction less general than the HDP Matrix Gamma which is nonetheless useful construction:

A special case of the $$\operatorname{Gamma}_{d \times d}(\alpha, \lambda)$$ distribution is the so-called $$A \Gamma$$ distribution, that has been considered in Pérez-Abreu and Stelzer (2014) and generalized to the Hpd setting in . To elaborate, the $$A \Gamma(\eta, \omega, \Sigma)$$ distribution is defined with the parameters $$\eta>d-1, \omega>0$$ and $$\Sigma \in$$ $$\mathcal{S}_{d}^{+}$$as the $$\operatorname{Gamma}_{d \times d}\left(\alpha_{\eta, \Sigma}, \lambda_{\Sigma}\right)$$ distribution, with $\alpha_{\eta, \boldsymbol{\Sigma}}(d \boldsymbol{U})=|\boldsymbol{\Sigma}|^{-\eta} \operatorname{tr}\left(\boldsymbol{\Sigma}^{-1} \boldsymbol{U}\right)^{-d \eta} \Gamma(d \eta) \tilde{\Gamma}_{d}(\eta)^{-1}|\boldsymbol{U}|^{\eta-d} d \boldsymbol{U},$ where $$\Gamma$$ denotes the Gamma function and $$\tilde{\Gamma}_{d}$$ the complex multivariate Gamma function , and $$\lambda_{\boldsymbol{\Sigma}}(\boldsymbol{U})=\operatorname{tr}\left(\boldsymbol{\Sigma}^{-1} \boldsymbol{U}\right)$$. It has the advantage that for $$\boldsymbol{X} \sim A \Gamma(\eta, \omega, \Sigma)$$, the formulas for mean and covariance structure are explicitly known: $\mathrm{E} \boldsymbol{X}=\frac{\omega}{d} \boldsymbol{\Sigma}, \quad \operatorname{Cov} \boldsymbol{X}=\frac{\omega}{d(\eta d+1)}\left(\eta \boldsymbol{I}_{d^{2}}+\boldsymbol{H}\right)(\boldsymbol{\Sigma} \otimes \boldsymbol{\Sigma}),$ where $$\boldsymbol{H}=\sum_{i, j=1}^{d} \boldsymbol{H}_{i, j} \otimes H_{j, i}$$ and $$\boldsymbol{H}_{i, j}$$ being the matrix having a one at $$(i, j)$$ and zeros elsewhere, see (Meier 2018 Lemma 2.8). Thus the A-distribution is particularly well suited for Bayesian prior modeling if the prior knowledge is given in terms of mean and covariance structure.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.