# Matrix- and vector-valued generalizations of Gamma processes

$\renewcommand{\var}{\operatorname{Var}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\mm}[1]{\mathrm{#1}} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}$

Processes that resemble Gamma processes but which take vector or matrix values.

We start by considering trivial processes which have an empty index set, i.e. multivariate gamma distributions. So here is the simplest multivariate case:

## Bivariate thinned Gamma RVs

There are many possible ways of generating two correlated gamma variates, but as far as I am concerned the default is the Beta thinning method.

For this and throughout, we use the fact that \begin{aligned} \rv{g}&\sim\operatorname{Gamma}(a,\lambda)\\ &\Rightarrow\\ \Ex[\rv{g}]&=\frac{a}{\lambda}\\ \var[\rv{g}]&=\frac{a}{\lambda^2}\\ \Ex[\rv{g}^2]&=\frac{a^2 +a}{\lambda^2}. \end{aligned}

Suppose we have $$\rv{g}_1\sim\operatorname{Gamma}(a_1+a_2,\lambda)$$ and $$\rv{b}\sim\operatorname{Beta}(a_1, a_2)$$ and $$\rv{g}'\sim\operatorname{Gamma}(a_2,\lambda)$$ with all rvs jointly independent. Now we generate $\rv{g}_2=\rv{b}\rv{g}_1+\rv{g}'.$ Then \begin{aligned}\Ex[\rv{g}_2] &=\Ex[\rv{b}\rv{g}_1]+\Ex[\rv{g}']\\ &=\frac{a_1+a_2}{\lambda} \end{aligned} and \begin{aligned}\Ex[\rv{g}_1\rv{g}_2] &=\Ex[\rv{g}_1(\rv{b}\rv{g}_1+\rv{g}')]\\ &=\Ex[\rv{g}_1\rv{b}\rv{g}_1+\rv{g}_1\rv{g}')]\\ &=\Ex[\rv{g}_1^2\rv{b}]+\Ex[\rv{g}_1\rv{g}')]\\ &=\Ex[\rv{g}_1^2]\Ex[\rv{b}]+\Ex[\rv{g}_1]\Ex[\rv{g}')]\\ &=\frac{(a_1+a_2)^2 + (a_1+a_2)}{\lambda^2}\frac{a_1}{a_1+a_2}+\frac{a_1+a_2}{\lambda}\frac{a_2}{\lambda}\\ &=\frac{a_1(a_1+a_2) + a_1}{\lambda^2}+\frac{a_2(a_1+a_2)}{\lambda^2}\\ &=\frac{a_1(a_1+a_2) + a_1+ a_2(a_1+a_2)}{\lambda^2}\\ &=\frac{a_1^2+2a_1a_2 + a_1+a_2^2}{\lambda^2} \end{aligned} and thus \begin{aligned}\var[\rv{g}_1,\rv{g}_2] &=\Ex[\rv{g}_1\rv{g}_2]-\Ex[\rv{g}_1]\Ex[\rv{g}_2]\\ &=\frac{a_1^2+2a_1a_2 + a_1+a_2^2-(a_1+a_2)^2}{\lambda^2}\\ &=\frac{a_1^2+2a_1a_2 + a_1+a_2^2-a_1^2-2a_1a_2-a_2^2}{\lambda^2}\\ &=\frac{a_1}{\lambda^2} \end{aligned} Thence \begin{aligned}\corr[\rv{g}_1,\rv{g}_2] &=\frac{\var[\rv{g}_1,\rv{g}_2]}{\sqrt{\var[\rv{g}_1]\var[\rv{g}_2]}}\\ &=\frac{\frac{a_1}{\lambda^2}}{\frac{a_1+a_2}{\lambda^2}}\\ &=\frac{a_1 }{a_1+a_2}. \end{aligned}

We can use this to generate more than two correlated gamma variates by successively thinning them, as long as we take care to remember that we can only add independent Gamma variates together to produce a new Gamma variate in general.

## Multivariate latent thinned Gamma RVs

A different construction.

Suppose we have $$K$$ independent latent gamma variates. $\rv{\zeta}_k \sim \operatorname{Gamma}(A, \lambda), k=1,2,\dots,K.$ Let us also suppose we have a $$\mathbb{R}^{D\times K}$$ latent mixing matrix, $$\mm{R}=[r_{d,k}]$$ whose entries are all non-negative and whose rows sum to $$A.$$ Now, let us define new random variates $\rv{g}_d =\sum_{k=1}^K \rv{b}_{d,k}\rv{\zeta}_k, d=1,2,\dots,D.$ where $$\rv{b}_{d,k}\sim\operatorname{Beta}(r_{d,k},A-r_{d,k}).$$ By construction and the Beta thinning rule, we know that $\rv{g}_d \sim \operatorname{Gamma}(A, \lambda), d=1,2,\dots,D.$ What can we say about the correlation between them?

For this, we need the fact that \begin{aligned} \rv{b}&\sim\operatorname{Beta}(\alpha,\beta)\\ &\Rightarrow\\ \Ex[\rv{b}]&=\frac{\alpha}{\alpha+\beta}\\ \var[\rv{b}]&=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\\ \Ex[\rv{b}^2]&=\frac{\alpha^2(\alpha+\beta+1)+\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \end{aligned}

Expanding out the $$\rv{g}$$s, \begin{aligned}\Ex[\rv{g}_1\rv{g}_2] &=\Ex\left[\left(\sum_{k=1}^K \rv{b}_{1,k}\rv{\zeta}_k\right)\left( \sum_{k=1}^K \rv{b}_{2,k}\rv{\zeta}_k\right)\right]\\ &=\Ex\left[\sum_{k=1}^K \sum_{j=1}^K \rv{b}_{1,k}\rv{\zeta}_k\rv{b}_{2,j}\rv{\zeta}_j\right]\\ &=\sum_{k=1}^K \sum_{j=1}^K \Ex\left[\rv{b}_{1,k}\rv{\zeta}_k\rv{b}_{2,j}\rv{\zeta}_j\right]\\ &=\sum_{k=1}^K \Ex\left[\rv{b}_{1,k}\rv{b}_{2,k}\rv{\zeta}_k^2\right] + 2\sum_{k=1}^K \sum_{j=k+1}^K \Ex\left[\rv{b}_{1,k}\rv{\zeta}_k\rv{b}_{2,j}\rv{\zeta}_j\right]\\ &=\sum_{k=1}^K \Ex[\rv{b}_{1,k}]\Ex[\rv{b}_{2,k}]\Ex[\rv{\zeta}_k^2] + 2\sum_{k=1}^K \sum_{j=k+1}^K \Ex[\rv{b}_{1,k}]\Ex[\rv{\zeta}_k]\Ex[\rv{b}_{2,j}]\Ex[\rv{\zeta}_j]\\ &=\frac{A^2+A}{\lambda^2 A^2}\sum_{k=1}^Kr_{1,k}r_{2,k} + 2\sum_{k=1}^K \sum_{j=k+1}^K \frac{r_{1,k}}{A} \frac{A}{\lambda} \frac{r_{2,j}}{A} \frac{A}{\lambda} \\ &=\frac{A+1}{\lambda^2 A}\sum_{k=1}^Kr_{1,k}r_{2,k} + 2\sum_{k=1}^K \sum_{j=k+1}^K \frac{r_{1,k}r_{2,j}}{\lambda^2} \\ &=\frac{1}{\lambda^2}\left(\frac{A+1}{ A}\sum_{k=1}^Kr_{1,k}r_{2,k} + 2\sum_{k=1}^K \sum_{j=k+1}^K r_{1,k}r_{2,j} \right)\\ &=\frac{1}{\lambda^2}\left(\frac{1}{ A}\sum_{k=1}^Kr_{1,k}r_{2,k} + \sum_{k=1}^K \sum_{j=1}^K r_{1,k}r_{2,j} \right)\\ &=\frac{\frac1A \vv{r}_1\cdot\vv{r}_2+A^2}{\lambda^2} \end{aligned}

So \begin{aligned}\var[\rv{g}_1,\rv{g}_2] &=\Ex[\rv{g}_1\rv{g}_2]-\Ex\rv{g}_1\Ex\rv{g}_2\\ &=\frac{ \vv{r}_1\cdot\vv{r}_2}{A\lambda^2}. \end{aligned} and \begin{aligned}\corr[\rv{g}_1,\rv{g}_2] &=\frac{\var[\rv{g}_1,\rv{g}_2]}{\sqrt{\var[\rv{g}_1]\var[\rv{g}_2]}}\\ &=\frac{\frac{ \vv{r}_1\cdot\vv{r}_2}{A\lambda^2}}{\frac{A}{\lambda^2}}\\ &=\frac{ \vv{r}_1\cdot\vv{r}_2}{A^2}. \end{aligned}

After all that work, we find pretty much what we would have expected for the correlation of rvs constructed by weighting of latent Gaussian factors. Was there a shortcut that could have gotten us there quicker?

(For one, we could have equivalently constructed this in terms of samples from a $$K$$-dimensional Lévy-Gamma process, but that is another story.)

## Wolpert’s spatial construction

By choosing different intersections of spatial support, different correlation is available in the spatial gamma processes of in R. Wolpert and Ickstadt (1998).

## Random correlated weights

I haven’t actually checked that this works for gamma marginals, but it looks similar at least. Ranganath and Blei (2018) creates random measures that I think can induce Gamma weights by imposing a Gaussian correlation structure on the random weights of a random selection of Poisson atoms.

## Kingman constructions

Possibly a unifying framework for the previous two.

## General multivariate Gamma distribution

How general can these be? Much effort has been spent on this question .

We use the convention that the Fourier transform $$\widehat{\mu}$$ of a measure $$\mu$$ on $$\mathbb{M}=\mathbb{R}^{d}$$ is given $\hat{\mu}(z)=\int_{\mathbb{M}} e^{i(z, x)} \mu(\mathrm{d} x) \quad z \in \mathbb{M}$

With this, Pérez-Abreu and Stelzer (2014) give a (maximally?) general multivariate Gamma distribution:

Let $$\mu$$ be an infinitely divisible probability distribution on $$\mathbb{R}^{d} .$$ If there exists a finite measure $$\alpha$$ on the unit sphere $$\mathbf{S}_{\mathbb{R}^{d},\|\cdot\|}$$ with respect to the norm $$\|\cdot\|$$ equipped with the Borel $$\sigma$$-algebra and a Borel-measurable function $$\lambda: \mathbf{S}_{\mathbb{R}^{d},\|\cdot\|} \rightarrow \mathbb{R}_{+}$$ such that the Fourier transform of the measure is given $\hat{\mu}(z)=\exp \left(\int_{\mathbf{S}_{\mathbb{R}^{d},\|\cdot\|}} \int_{\mathbb{R}_{+}}\left(e^{i r v^{\top} z}-1\right) \frac{e^{-\lambda(v) r}}{r} \mathrm{~d} r \alpha(\mathrm{d} v)\right)$ for all $$z \in \mathbb{R}^{d}$$, then $$\mu$$ is called a d-dimensional Gamma distribution with parameters $$\alpha$$ and $$\lambda$$, abbreviated $$\Gamma_{d}(\alpha, \lambda)$$ distribution. If $$\lambda$$ is constant, we call $$\mu$$ a $$\|\cdot\|$$-homogeneous $$\Gamma_{d}(\alpha, \lambda)$$-distribution.

Intriguingly the choice which norm $$\|\cdot\|$$ doesn’t matter much; changing the norm just perturbs $$\lambda$$ in a simple fashion.

It is not clear to me what families of $$\lambda$$s and $$alpha$$s are interesting from that description.

We do know which ones are admissible, at least:

Let $$\alpha$$ be a finite measure on $$\mathbf{S}_{\mathbb{R}^{d},\|\cdot\|}$$ and $$\beta: \mathbf{S}_{\mathbb{R}^{d},\|\cdot\|} \rightarrow \mathbb{R}_{+}$$ a measurable function. Then [the previous construction] defines a Lévy measure $$v_{\mu}$$ and thus there exists a $$\Gamma_{d}(\alpha, \beta)$$ probability distribution $$\mu$$ if and only if $\int_{\mathbf{S}_{\mathbb{R}}^{d},\|\cdot\|} \ln \left(1+\frac{1}{\beta(v)}\right) \alpha(\mathrm{d} v)<\infty .$ Moreover, $$\int_{\mathbb{R}^{d}}(\|x\| \wedge 1) v_{\mu}(\mathrm{d} x)<\infty$$ holds true. The condition os trivially satisfied, if $$\beta$$ is bounded away from zero $$\alpha$$-almost everywhere.

Something is weird about that integral though.

## Vector Gamma process

How can we turn the a multivariate gamma distribution into a vector valued gamma process?

An associated Lévy process is easy. Are there any Ornstein-Uhlenbeck-type processes?

TBD

## Wishart processes

Wishart distributions are commonly claimed to generalise Gamma processes, although AFAICT they are not so similar. “Wishart processes” are indeed a thing ; although the Wishart distribution is not a special case, AFAICT the same as a matrix Gamma distribution.

The common definition is not necessarily (element-wise, or in any other sense) monotonic; I am presuming we have stationary marginals? It generalises the square Bessel process, which is marginally $$\chi^2$$ distributed.

## Inverse Wishart

Does the Inverse Wishart Process relate? TODO

## HDP Matrix Gamma Process

Matrix-valued Lévy-Gamma process analogue. See , which uses the multivariate construction of Pérez-Abreu and Stelzer (2014). That construction is an extremely general, and somewhat abstract, and is easy to handle usually only through its Lévy measure.

### AΓ Process

Meier, Kirch, and Meyer (2020) mentions a construction less general than the HDP Matrix Gamma which is nonetheless useful construction:

A special case of the $$\operatorname{Gamma}_{d \times d}(\alpha, \lambda)$$ distribution is the so-called $$A \Gamma$$ distribution, that has been considered in Pérez-Abreu and Stelzer (2014) and generalized to the Hpd setting in . To elaborate, the $$A \Gamma(\eta, \omega, \Sigma)$$ distribution is defined with the parameters $$\eta>d-1, \omega>0$$ and $$\Sigma \in$$ $$\mathcal{S}_{d}^{+}$$as the $$\operatorname{Gamma}_{d \times d}\left(\alpha_{\eta, \Sigma}, \lambda_{\Sigma}\right)$$ distribution, with $\alpha_{\eta, \boldsymbol{\Sigma}}(d \boldsymbol{U})=|\boldsymbol{\Sigma}|^{-\eta} \operatorname{tr}\left(\boldsymbol{\Sigma}^{-1} \boldsymbol{U}\right)^{-d \eta} \Gamma(d \eta) \tilde{\Gamma}_{d}(\eta)^{-1}|\boldsymbol{U}|^{\eta-d} d \boldsymbol{U},$ where $$\Gamma$$ denotes the Gamma function and $$\tilde{\Gamma}_{d}$$ the complex multivariate Gamma function , and $$\lambda_{\boldsymbol{\Sigma}}(\boldsymbol{U})=\operatorname{tr}\left(\boldsymbol{\Sigma}^{-1} \boldsymbol{U}\right)$$. It has the advantage that for $$\boldsymbol{X} \sim A \Gamma(\eta, \omega, \Sigma)$$, the formulas for mean and covariance structure are explicitly known: $\mathrm{E} \boldsymbol{X}=\frac{\omega}{d} \boldsymbol{\Sigma}, \quad \operatorname{Cov} \boldsymbol{X}=\frac{\omega}{d(\eta d+1)}\left(\eta \boldsymbol{I}_{d^{2}}+\boldsymbol{H}\right)(\boldsymbol{\Sigma} \otimes \boldsymbol{\Sigma}),$ where $$\boldsymbol{H}=\sum_{i, j=1}^{d} \boldsymbol{H}_{i, j} \otimes H_{j, i}$$ and $$\boldsymbol{H}_{i, j}$$ being the matrix having a one at $$(i, j)$$ and zeros elsewhere, see (Meier 2018 Lemma 2.8). Thus the A-distribution is particularly well suited for Bayesian prior modeling if the prior knowledge is given in terms of mean and covariance structure.

## References

Barndorff-Nielsen, Ole E., Makoto Maejima, and Ken-Iti Sato. 2006. Bernoulli 12 (1): 1–33.
Barndorff-Nielsen, Ole E., Jan Pedersen, and Ken-Iti Sato. 2001. Advances in Applied Probability 33 (1): 160–87.
Bladt, Mogens, and Bo Friis Nielsen. 2010. Stochastic Models 26 (1): 1–26.
Buchmann, Boris, Benjamin Kaehler, Ross Maller, and Alexander Szimayer. 2015. arXiv:1502.03901 [Math, q-Fin], February.
Das, Sourish, and Dipak K. Dey. 2010. Statistics & Probability Letters 80 (19-20): 1492–99.
Foti, Nicholas, Joseph Futoma, Daniel Rockmore, and Sinead Williamson. 2013. In Artificial Intelligence and Statistics, 20–28.
Griffin, J.E. 2011. Journal of Statistical Planning and Inference 141 (11): 3648–64.
Grigelionis, Bronius. 2013. Student’s t-Distribution and Related Stochastic Processes. SpringerBriefs in Statistics. Berlin, Heidelberg: Springer Berlin Heidelberg.
Grunwald, G K, R J Hyndman, and L M Tedesco. n.d. “A Uniﬁed View of Linear AR(1) Models,” 26.
Kirch, Claudia, Matthew C. Edwards, Alexander Meier, and Renate Meyer. 2019. Bayesian Analysis 14 (4): 1037–73.
Laverny, Oskar, Esterina Masiello, Véronique Maume-Deschamps, and Didier Rullière. 2021. arXiv:2103.03200 [Math, Stat], July.
Lawrence, Neil D., and Raquel Urtasun. 2009. In Proceedings of the 26th Annual International Conference on Machine Learning, 601–8. ICML ’09. New York, NY, USA: ACM.
Liou, Jun-Jih, Yuan-Fong Su, Jie-Lun Chiang, and Ke-Sheng Cheng. 2011. Stochastic Environmental Research and Risk Assessment 25 (2): 235–51.
Mathai, A. M., and P. G. Moschopoulos. 1991. Journal of Multivariate Analysis 39 (1): 135–53.
Mathai, A. M., and Serge B. Provost. 2005. Linear Algebra and Its Applications, Tenth Special Issue (Part 2) on Linear Algebra and Statistics, 410 (November): 198–216.
Mathal, A. M., and P. G. Moschopoulos. 1992. Annals of the Institute of Statistical Mathematics 44 (1): 97–106.
Meier, Alexander. 2018.
Meier, Alexander, Claudia Kirch, Matthew C. Edwards, and Renate Meyer. 2019. beyondWhittle: Bayesian Spectral Inference for Stationary Time Series (version 1.1.1).
Meier, Alexander, Claudia Kirch, and Renate Meyer. 2020. Journal of Multivariate Analysis 175 (January): 104560.
Pérez-Abreu, Victor, and Robert Stelzer. 2014. Journal of Multivariate Analysis 130 (September): 155–75.
Pfaffel, Oliver. 2012. arXiv:1201.3256 [Math], January.
Ranganath, Rajesh, and David M. Blei. 2018. Journal of the American Statistical Association 113 (521): 417–30.
Rao, Vinayak, and Yee Whye Teh. 2009. “Spatial Normalized Gamma Processes.” In Proceedings of the 22nd International Conference on Neural Information Processing Systems, 1554–62. NIPS’09. Red Hook, NY, USA: Curran Associates Inc.
Sato, Ken-iti. 1999. Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press.
Semeraro, Patrizia. 2008. International Journal of Theoretical and Applied Finance 11 (01): 1–18.
Shah, Amar, Andrew Wilson, and Zoubin Ghahramani. 2014. In Artificial Intelligence and Statistics, 877–85. PMLR.
Sim, C.H. 1993. Journal of Statistical Computation and Simulation 47 (1-2): 1–10.
Singpurwalla, Nozer D., and Mark A. Youngren. 1993. Scandinavian Journal of Statistics 20 (3): 251–61.
Thibaux, Romain, and Michael I. Jordan. 2007. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, 564–71. PMLR.
Tracey, Brendan D., and David H. Wolpert. 2018. 2018 AIAA Non-Deterministic Approaches Conference, January.
Walker, Stephen G. 2021. Communications in Statistics - Theory and Methods 0 (0): 1–7.
Warren, D. 1992. Stochastic Hydrology and Hydraulics 6 (3): 183–90.
Warren, David. 1986. Journal of Hydrology 85 (1): 127–37.
Wilson, Andrew Gordon, and Zoubin Ghahramani. 2011. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 736–44. UAI’11. Arlington, Virginia, United States: AUAI Press.
Wolpert, R., and Katja Ickstadt. 1998. Biometrika 85 (2): 251–67.
Wolpert, Robert L. 2021. arXiv:2106.00087 [Math], May.
Xuan, Junyu, Jie Lu, Guangquan Zhang, Richard Yi Da Xu, and Xiangfeng Luo. 2015. arXiv:1503.08542 [Cs, Stat], March.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.