- Bivariate thinned Gamma RVs
- Multivariate latent thinned Gamma RVs
- Wolpert’s spatial construction
- Random correlated weights
- Kingman constructions
- General multivariate Gamma distribution
- Vector Gamma process
- Ornstein-Uhlenbeck Dirichlet process
- Wishart processes
- Inverse Wishart
- HDP Matrix Gamma Process
- References

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\mm}[1]{\mathrm{#1}} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]

Processes that resemble Gamma processes but which take vector or matrix values.

We start by considering trivial processes which have an empty index set, i.e. multivariate gamma distributions. So here is the simplest multivariate case:

## Bivariate thinned Gamma RVs

There are many possible ways of generating two correlated gamma variates, but as far as I am concerned the default is the Beta thinning method.

For this and throughout, we use the fact that \[\begin{aligned} \rv{g}&\sim\operatorname{Gamma}(a,\lambda)\\ &\Rightarrow\\ \Ex[\rv{g}]&=\frac{a}{\lambda}\\ \var[\rv{g}]&=\frac{a}{\lambda^2}\\ \Ex[\rv{g}^2]&=\frac{a^2 +a}{\lambda^2}. \end{aligned}\]

Suppose we have \(\rv{g}_1\sim\operatorname{Gamma}(a_1+a_2,\lambda)\) and \(\rv{b}\sim\operatorname{Beta}(a_1, a_2)\) and \(\rv{g}'\sim\operatorname{Gamma}(a_2,\lambda)\) with all rvs jointly independent. Now we generate \[\rv{g}_2=\rv{b}\rv{g}_1+\rv{g}'.\] Then \[\begin{aligned}\Ex[\rv{g}_2] &=\Ex[\rv{b}\rv{g}_1]+\Ex[\rv{g}']\\ &=\frac{a_1+a_2}{\lambda} \end{aligned}\] and \[\begin{aligned}\Ex[\rv{g}_1\rv{g}_2] &=\Ex[\rv{g}_1(\rv{b}\rv{g}_1+\rv{g}')]\\ &=\Ex[\rv{g}_1\rv{b}\rv{g}_1+\rv{g}_1\rv{g}')]\\ &=\Ex[\rv{g}_1^2\rv{b}]+\Ex[\rv{g}_1\rv{g}')]\\ &=\Ex[\rv{g}_1^2]\Ex[\rv{b}]+\Ex[\rv{g}_1]\Ex[\rv{g}')]\\ &=\frac{(a_1+a_2)^2 + (a_1+a_2)}{\lambda^2}\frac{a_1}{a_1+a_2}+\frac{a_1+a_2}{\lambda}\frac{a_2}{\lambda}\\ &=\frac{a_1(a_1+a_2) + a_1}{\lambda^2}+\frac{a_2(a_1+a_2)}{\lambda^2}\\ &=\frac{a_1(a_1+a_2) + a_1+ a_2(a_1+a_2)}{\lambda^2}\\ &=\frac{a_1^2+2a_1a_2 + a_1+a_2^2}{\lambda^2} \end{aligned}\] and thus \[\begin{aligned}\var[\rv{g}_1,\rv{g}_2] &=\Ex[\rv{g}_1\rv{g}_2]-\Ex[\rv{g}_1]\Ex[\rv{g}_2]\\ &=\frac{a_1^2+2a_1a_2 + a_1+a_2^2-(a_1+a_2)^2}{\lambda^2}\\ &=\frac{a_1^2+2a_1a_2 + a_1+a_2^2-a_1^2-2a_1a_2-a_2^2}{\lambda^2}\\ &=\frac{a_1}{\lambda^2} \end{aligned}\] Thence \[\begin{aligned}\corr[\rv{g}_1,\rv{g}_2] &=\frac{\var[\rv{g}_1,\rv{g}_2]}{\sqrt{\var[\rv{g}_1]\var[\rv{g}_2]}}\\ &=\frac{\frac{a_1}{\lambda^2}}{\frac{a_1+a_2}{\lambda^2}}\\ &=\frac{a_1 }{a_1+a_2}. \end{aligned}\]

We can use this to generate more than two correlated gamma variates by successively thinning them, as long as we take care to remember that we can only add independent Gamma variates together to produce a new Gamma variate in general.

## Multivariate latent thinned Gamma RVs

A different construction.

Suppose we have \(K\) independent latent gamma variates. \[ \rv{\zeta}_k \sim \operatorname{Gamma}(A, \lambda), k=1,2,\dots,K. \] Let us also suppose we have a \(\mathbb{R}^{D\times K}\) latent mixing matrix, \(\mm{R}=[r_{d,k}]\) whose entries are all non-negative and whose rows sum to \(A.\) Now, let us define new random variates \[ \rv{g}_d =\sum_{k=1}^K \rv{b}_{d,k}\rv{\zeta}_k, d=1,2,\dots,D. \] where \(\rv{b}_{d,k}\sim\operatorname{Beta}(r_{d,k},A-r_{d,k}).\) By construction and the Beta thinning rule, we know that \[ \rv{g}_d \sim \operatorname{Gamma}(A, \lambda), d=1,2,\dots,D. \] What can we say about the correlation between them?

For this, we need the fact that \[\begin{aligned} \rv{b}&\sim\operatorname{Beta}(\alpha,\beta)\\ &\Rightarrow\\ \Ex[\rv{b}]&=\frac{\alpha}{\alpha+\beta}\\ \var[\rv{b}]&=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\\ \Ex[\rv{b}^2]&=\frac{\alpha^2(\alpha+\beta+1)+\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \end{aligned}\]

Expanding out the \(\rv{g}\)s, \[\begin{aligned}\Ex[\rv{g}_1\rv{g}_2] &=\Ex\left[\left(\sum_{k=1}^K \rv{b}_{1,k}\rv{\zeta}_k\right)\left( \sum_{k=1}^K \rv{b}_{2,k}\rv{\zeta}_k\right)\right]\\ &=\Ex\left[\sum_{k=1}^K \sum_{j=1}^K \rv{b}_{1,k}\rv{\zeta}_k\rv{b}_{2,j}\rv{\zeta}_j\right]\\ &=\sum_{k=1}^K \sum_{j=1}^K \Ex\left[\rv{b}_{1,k}\rv{\zeta}_k\rv{b}_{2,j}\rv{\zeta}_j\right]\\ &=\sum_{k=1}^K \Ex\left[\rv{b}_{1,k}\rv{b}_{2,k}\rv{\zeta}_k^2\right] + 2\sum_{k=1}^K \sum_{j=k+1}^K \Ex\left[\rv{b}_{1,k}\rv{\zeta}_k\rv{b}_{2,j}\rv{\zeta}_j\right]\\ &=\sum_{k=1}^K \Ex[\rv{b}_{1,k}]\Ex[\rv{b}_{2,k}]\Ex[\rv{\zeta}_k^2] + 2\sum_{k=1}^K \sum_{j=k+1}^K \Ex[\rv{b}_{1,k}]\Ex[\rv{\zeta}_k]\Ex[\rv{b}_{2,j}]\Ex[\rv{\zeta}_j]\\ &=\frac{A^2+A}{\lambda^2 A^2}\sum_{k=1}^Kr_{1,k}r_{2,k} + 2\sum_{k=1}^K \sum_{j=k+1}^K \frac{r_{1,k}}{A} \frac{A}{\lambda} \frac{r_{2,j}}{A} \frac{A}{\lambda} \\ &=\frac{A+1}{\lambda^2 A}\sum_{k=1}^Kr_{1,k}r_{2,k} + 2\sum_{k=1}^K \sum_{j=k+1}^K \frac{r_{1,k}r_{2,j}}{\lambda^2} \\ &=\frac{1}{\lambda^2}\left(\frac{A+1}{ A}\sum_{k=1}^Kr_{1,k}r_{2,k} + 2\sum_{k=1}^K \sum_{j=k+1}^K r_{1,k}r_{2,j} \right)\\ &=\frac{1}{\lambda^2}\left(\frac{1}{ A}\sum_{k=1}^Kr_{1,k}r_{2,k} + \sum_{k=1}^K \sum_{j=1}^K r_{1,k}r_{2,j} \right)\\ &=\frac{\frac1A \vv{r}_1\cdot\vv{r}_2+A^2}{\lambda^2} \end{aligned}\]

So \[\begin{aligned}\var[\rv{g}_1,\rv{g}_2] &=\Ex[\rv{g}_1\rv{g}_2]-\Ex\rv{g}_1\Ex\rv{g}_2\\ &=\frac{ \vv{r}_1\cdot\vv{r}_2}{A\lambda^2}. \end{aligned}\] and \[\begin{aligned}\corr[\rv{g}_1,\rv{g}_2] &=\frac{\var[\rv{g}_1,\rv{g}_2]}{\sqrt{\var[\rv{g}_1]\var[\rv{g}_2]}}\\ &=\frac{\frac{ \vv{r}_1\cdot\vv{r}_2}{A\lambda^2}}{\frac{A}{\lambda^2}}\\ &=\frac{ \vv{r}_1\cdot\vv{r}_2}{A^2}. \end{aligned}\]

After all that work, we find pretty much what we would have expected for the correlation of rvs constructed by weighting of latent Gaussian factors. Was there a shortcut that could have gotten us there quicker?

(For one, we could have equivalently constructed this in terms of samples from a \(K\)-dimensional Lévy-Gamma process, but that is another story.)

## Wolpert’s spatial construction

By choosing different intersections of spatial support, different correlation is available in the spatial gamma processes of in R. Wolpert and Ickstadt (1998).

## Kingman constructions

Possibly a unifying framework for the previous two.

## General multivariate Gamma distribution

How general can these be? Much effort has been spent on this question (Barndorff-Nielsen, Maejima, and Sato 2006; Barndorff-Nielsen, Pedersen, and Sato 2001; Buchmann et al. 2015; Mathai and Moschopoulos 1991; Mathal and Moschopoulos 1992; Pérez-Abreu and Stelzer 2014; Semeraro 2008; Singpurwalla and Youngren 1993).

We use the convention that the Fourier transform \(\widehat{\mu}\) of a measure \(\mu\) on \(\mathbb{M}=\mathbb{R}^{d}\) is given \[ \hat{\mu}(z)=\int_{\mathbb{M}} e^{i(z, x)} \mu(\mathrm{d} x) \quad z \in \mathbb{M} \]

With this, Pérez-Abreu and Stelzer (2014) give a (maximally?) general multivariate Gamma distribution:

Let \(\mu\) be an infinitely divisible probability distribution on \(\mathbb{R}^{d} .\) If there exists a finite measure \(\alpha\) on the unit sphere \(\mathbf{S}_{\mathbb{R}^{d},\|\cdot\|}\) with respect to the norm \(\|\cdot\|\) equipped with the Borel \(\sigma\)-algebra and a Borel-measurable function \(\lambda: \mathbf{S}_{\mathbb{R}^{d},\|\cdot\|} \rightarrow \mathbb{R}_{+}\) such that the Fourier transform of the measure is given \[ \hat{\mu}(z)=\exp \left(\int_{\mathbf{S}_{\mathbb{R}^{d},\|\cdot\|}} \int_{\mathbb{R}_{+}}\left(e^{i r v^{\top} z}-1\right) \frac{e^{-\lambda(v) r}}{r} \mathrm{~d} r \alpha(\mathrm{d} v)\right) \] for all \(z \in \mathbb{R}^{d}\), then \(\mu\) is called a d-dimensional Gamma distribution with parameters \(\alpha\) and \(\lambda\), abbreviated \(\Gamma_{d}(\alpha, \lambda)\) distribution. If \(\lambda\) is constant, we call \(\mu\) a \(\|\cdot\|\)-homogeneous \(\Gamma_{d}(\alpha, \lambda)\)-distribution.

Intriguingly the choice which norm \(\|\cdot\|\) doesn’t matter much; changing the norm just perturbs \(\lambda\) in a simple fashion.

It is not clear to me what families of \(\lambda\)s and \(alpha\)s are interesting from that description.

We *do* know which ones are admissible, at least:

Let \(\alpha\) be a finite measure on \(\mathbf{S}_{\mathbb{R}^{d},\|\cdot\|}\) and \(\beta: \mathbf{S}_{\mathbb{R}^{d},\|\cdot\|} \rightarrow \mathbb{R}_{+}\) a measurable function. Then [the previous construction] defines a Lévy measure \(v_{\mu}\) and thus there exists a \(\Gamma_{d}(\alpha, \beta)\) probability distribution \(\mu\) if and only if \[ \int_{\mathbf{S}_{\mathbb{R}}^{d},\|\cdot\|} \ln \left(1+\frac{1}{\beta(v)}\right) \alpha(\mathrm{d} v)<\infty . \] Moreover, \(\int_{\mathbb{R}^{d}}(\|x\| \wedge 1) v_{\mu}(\mathrm{d} x)<\infty\) holds true. The condition os trivially satisfied, if \(\beta\) is bounded away from zero \(\alpha\)-almost everywhere.

Something is weird about that integral though.

## Vector Gamma process

How can we turn the a multivariate gamma distribution into a vector valued gamma process?

An associated Lévy process is easy. Are there any Ornstein-Uhlenbeck-type processes?

## Ornstein-Uhlenbeck Dirichlet process

TBD

## Wishart processes

Wishart distributions are commonly claimed to generalise Gamma processes, although AFAICT they are not so similar. “Wishart processes” are indeed a thing (Pfaffel 2012; Wilson and Ghahramani 2011); although the Wishart distribution is not a special case, AFAICT the same as a matrix Gamma distribution.

The common definition is not necessarily (element-wise, or in any other sense) monotonic; I am presuming we have stationary marginals?
It generalises the *square Bessel process*, which
is marginally \(\chi^2\) distributed.

## Inverse Wishart

Does the Inverse Wishart Process relate? (Shah, Wilson, and Ghahramani 2014; Tracey and Wolpert 2018) TODO

## HDP Matrix Gamma Process

Matrix-valued Lévy-Gamma process analogue. See (Meier, Kirch, and Meyer 2020, sec. 2), which uses the multivariate construction of Pérez-Abreu and Stelzer (2014). That construction is an extremely general, and somewhat abstract, and is easy to handle usually only through its Lévy measure.

### AΓ Process

Meier, Kirch, and Meyer (2020) mentions a construction less general than the HDP Matrix Gamma which is nonetheless useful construction:

A special case of the \(\operatorname{Gamma}_{d \times d}(\alpha, \lambda)\) distribution is the so-called \(A \Gamma\) distribution, that has been considered in Pérez-Abreu and Stelzer (2014) and generalized to the Hpd setting in (Meier 2018, sec. 2.4). To elaborate, the \(A \Gamma(\eta, \omega, \Sigma)\) distribution is defined with the parameters \(\eta>d-1, \omega>0\) and \(\Sigma \in\) \(\mathcal{S}_{d}^{+}\)as the \(\operatorname{Gamma}_{d \times d}\left(\alpha_{\eta, \Sigma}, \lambda_{\Sigma}\right)\) distribution, with \[ \alpha_{\eta, \boldsymbol{\Sigma}}(d \boldsymbol{U})=|\boldsymbol{\Sigma}|^{-\eta} \operatorname{tr}\left(\boldsymbol{\Sigma}^{-1} \boldsymbol{U}\right)^{-d \eta} \Gamma(d \eta) \tilde{\Gamma}_{d}(\eta)^{-1}|\boldsymbol{U}|^{\eta-d} d \boldsymbol{U}, \] where \(\Gamma\) denotes the Gamma function and \(\tilde{\Gamma}_{d}\) the complex multivariate Gamma function (see Mathai and Provost 2005), and \(\lambda_{\boldsymbol{\Sigma}}(\boldsymbol{U})=\operatorname{tr}\left(\boldsymbol{\Sigma}^{-1} \boldsymbol{U}\right)\). It has the advantage that for \(\boldsymbol{X} \sim A \Gamma(\eta, \omega, \Sigma)\), the formulas for mean and covariance structure are explicitly known: \[ \mathrm{E} \boldsymbol{X}=\frac{\omega}{d} \boldsymbol{\Sigma}, \quad \operatorname{Cov} \boldsymbol{X}=\frac{\omega}{d(\eta d+1)}\left(\eta \boldsymbol{I}_{d^{2}}+\boldsymbol{H}\right)(\boldsymbol{\Sigma} \otimes \boldsymbol{\Sigma}), \] where \(\boldsymbol{H}=\sum_{i, j=1}^{d} \boldsymbol{H}_{i, j} \otimes H_{j, i}\) and \(\boldsymbol{H}_{i, j}\) being the matrix having a one at \((i, j)\) and zeros elsewhere, see (Meier 2018 Lemma 2.8). Thus the A-distribution is particularly well suited for Bayesian prior modeling if the prior knowledge is given in terms of mean and covariance structure.

## References

*Bernoulli*12 (1): 1–33.

*Advances in Applied Probability*33 (1): 160–87.

*Stochastic Models*26 (1): 1–26.

*arXiv:1502.03901 [Math, q-Fin]*, February.

*Statistics & Probability Letters*80 (19-20): 1492–99.

*Artificial Intelligence and Statistics*, 20–28.

*Journal of Statistical Planning and Inference*141 (11): 3648–64.

*Student’s t-Distribution and Related Stochastic Processes*. SpringerBriefs in Statistics. Berlin, Heidelberg: Springer Berlin Heidelberg.

*Bayesian Analysis*14 (4): 1037–73.

*arXiv:2103.03200 [Math, Stat]*, July.

*Proceedings of the 26th Annual International Conference on Machine Learning*, 601–8. ICML ’09. New York, NY, USA: ACM.

*Stochastic Environmental Research and Risk Assessment*25 (2): 235–51.

*Journal of Multivariate Analysis*39 (1): 135–53.

*Linear Algebra and Its Applications*, Tenth Special Issue (Part 2) on Linear Algebra and Statistics, 410 (November): 198–216.

*Annals of the Institute of Statistical Mathematics*44 (1): 97–106.

*beyondWhittle: Bayesian Spectral Inference for Stationary Time Series*(version 1.1.1).

*Journal of Multivariate Analysis*175 (January): 104560.

*Journal of Multivariate Analysis*130 (September): 155–75.

*arXiv:1201.3256 [Math]*, January.

*Proceedings of the 22nd International Conference on Neural Information Processing Systems*, 1554–62. NIPS’09. Red Hook, NY, USA: Curran Associates Inc.

*Lévy Processes and Infinitely Divisible Distributions*. Cambridge University Press.

*International Journal of Theoretical and Applied Finance*11 (01): 1–18.

*Artificial Intelligence and Statistics*, 877–85. PMLR.

*Journal of Statistical Computation and Simulation*47 (1-2): 1–10.

*Scandinavian Journal of Statistics*20 (3): 251–61.

*Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics*, 564–71. PMLR.

*2018 AIAA Non-Deterministic Approaches Conference*, January.

*Communications in Statistics - Theory and Methods*0 (0): 1–7.

*Stochastic Hydrology and Hydraulics*6 (3): 183–90.

*Journal of Hydrology*85 (1): 127–37.

*Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence*, 736–44. UAI’11. Arlington, Virginia, United States: AUAI Press.

*Biometrika*85 (2): 251–67.

*arXiv:2106.00087 [Math]*, May.

*arXiv:1503.08542 [Cs, Stat]*, March.

## No comments yet. Why not leave one?