Estimating the thing that is given to you by oracles in statistics homework assignments: the covariance matrix. Or, if the data is indexed by some parameter we might consider the covariance kernel. We are especially interested in this in Gaussian processes, where the covariance structure characterises the process up to its mean.

I am not introducing a complete theory of covariance estimation here, merely some notes.

Two big data problems problems can arise here: large \(p\) (ambient dimension) and large \(n\) (sample size). Large \(p\) is a problem because the covariance matrix is a \(p \times p\) matrix and frequently we need to invert it to calculate some target estimand.

Often life can be made not too bad for large \(n\) with Gaussian structure because, essentially, the problem has nice nearly low rank structure.

## Bayesian

Inverse Wishart priors. π Other?

## Precision estimation

The workhorse of learning graphical models under linearity and Gaussianity. See precision estimation for a more complete treatment.

## Continuous

See kernel learning.

## Parametric

### Cholesky methods

### on a lattice

Estimating a stationary covariance function on a regular lattice?
That is a whole field of its own.
Useful keywords include *circulant embedding*.
Although strictly more general than Gaussian processes on a lattice, it is often used in that context and some extra results are on that page for now.

## Unordered

Thanks to Rothman (2010) I now think about covariance estimates as being different in ordered versus exchangeable data.

## Sandwich estimators

For robust covariances of vector data. AKA Heteroskedasticity-consistent covariance estimators. Incorporating Eicker-Huber-White sandwich estimator, Andrews kernel HAC estimator, Newey-West and others. For an intro see Achim Zeileis, Open-Source Econometric Computing in R.

## Incoming

- Basic inference using Inverse Wishart by having a basic βprocess modelβ that increases uncertainty of the covariance estimate.
- general moment combination tricks
- John Cookβs comparison of standard deviation estimation tricks

## Bounding by harmonic and arithmetic means

There are some known bounds for the univariate case. Wikipedia says, in Relations with the harmonic and arithmetic means that it has been shown (Mercer 2000) that for a sample \(\left\{y_i\right\}\) of positive real numbers, \[ \sigma_y^2 \leq 2 y_{\max }(A-H) \] where \(y_{\max }\) is the maximum of the sample, \(A\) is the arithmetic mean, \(H\) is the harmonic mean of the sample and \(\sigma_y^2\) is the (biased) variance of the sample. This bound has been improved, and it is known that variance is bounded by \[ \begin{gathered} \sigma_y^2 \leq \frac{y_{\max }(A-H)\left(y_{\max }-A\right)}{y_{\max }-H}, \\ \sigma_y^2 \geq \frac{y_{\min }(A-H)\left(A-y_{\min }\right)}{H-y_{\min }}, \end{gathered} \] where \(y_{\min }\) is the minimum of the sample (Sharma 2008).

Mond and PecΜariΔ (1996) says

Let us define the arithmetic mean of \(A\) with weight \(w\) as \[ A_n(A ; w)=\sum_{i=1}^n w_i A_i \] and the harmonic mean of \(A\) with weight \(w\) as \[ H_n(A ; w)=\left(\sum_{i=1}^n w_i A_i^{-1}\right)^{-1} \] It is well known \([2,5]\) that \[ H_n(A ; w) \leqslant A_n(A ; w) \] Moreover, if \(A_{i j}(i, j=1, \ldots, n)\) are positive definite matrices from \(H_m\), then the following inequality is also valid [1]: \[ \frac{1}{n} \sum_{j=1}^n\left(\frac{1}{n} \sum_{i=1}^n A_{i j}^{-1}\right)^{-1} \leqslant\left[\frac{1}{n} \sum_{i=1}^n\left(\frac{1}{n} \sum_{j=1}^n A_{i j}\right)^{-1}\right]^{-1} \]

For multivariate covariance we are interested in the PSD matrix version of this.

## References

*Physica D: Nonlinear Phenomena*, Data Assimilation, 230 (1): 99β111.

*arXiv:1506.00898 [Cs, Math, Stat]*, June.

*The Annals of Probability*33 (5): 1643β97.

*Journal of Machine Learning Research*9 (Mar): 485β516.

*Statistica Sinica*10 (4): 1281β311.

*Communications on Pure and Applied Mathematics*58 (10): 1316β57.

*The Annals of Statistics*36 (1): 199β227.

*Statistical Inference for Stochastic Processes*5 (3): 287β306.

*The Annals of Statistics*38 (4): 2118β44.

*The American Statistician*37 (3): 242β47.

*Proceedings of the 34th International Conference on Neural Information Processing Systems*, 2722β33. NIPSβ20. Red Hook, NY, USA: Curran Associates Inc.

*The Annals of Statistics*41 (6).

*Annual Review of Statistics and Its Application*5 (1): 533β59.

*Proceedings of the 25th International Conference on Machine Learning*, 192β99. ICML β08. New York, NY, USA: ACM Press.

*Environmetrics*12 (2): 161β78.

*Journal of Multivariate Analysis*100 (10): 2352β63.

*Learning Theory*, edited by Nader H. Bshouty and Claudio Gentile, 4539:278β92. Berlin, Heidelberg: Springer Berlin Heidelberg.

*The Econometrics Journal*19 (1): C1β32.

*Biostatistics*9 (3): 432β41.

*Journal of Statistical Planning and Inference*136 (2): 447β66.

*Journal of Multivariate Analysis*98 (2): 227β55.

*Journal of Computational and Graphical Statistics*15 (3): 502β23.

*Journal of the American Statistical Association*105 (491): 1167β77.

*Journal of the American Statistical Association*55 (292): 708β13.

*Hierarchical Matrices: Algorithms and Analysis*. 1st ed. Springer Series in Computational Mathematics 49. Heidelberg New York Dordrecht London: Springer Publishing Company, Incorporated.

*Journal of Econometrics*140 (2): 670β94.

*arXiv:1410.6764 [Math]*, October.

*Biometrika*93 (1): 85β98.

*Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability*, 1:361β79. University of California Press.

*arXiv:1507.02061 [Math, Stat]*, July.

*Journal of the American Statistical Association*96 (456): 1387β96.

*Computing*84 (1-2): 49β67.

*arXiv:1206.6361 [Cs, Stat]*, June.

*WIREs Computational Statistics*9 (6): e1415.

*Annals of Statistics*37 (6B): 4254β78.

*Journal of Multivariate Analysis*88 (2): 365β411.

*Journal of the American Statistical Association*69 (348): 859β66.

*Journal of Multivariate Analysis*36 (2): 163β74.

*Biometrika*71 (1): 135β46.

*Journal of Multivariate Analysis*175 (January): 104560.

*The Annals of Statistics*34 (3): 1436β62.

*Journal of Mathematical Analysis and Applications*243 (1): 163β73.

*Geoderma*, Pedometrics 2003, 128 (3β4): 192β207.

*Linear Algebra and Its Applications*, Linear Algebra and Statistics: In Celebration of C. R. Raoβs 75th Birthday (September 10, 1995), 237-238 (April): 449β54.

*Sandia Report SAND2008-6212, Sandia National Laboratories*.

*Statistical Science*26 (3): 369β87.

*Electronic Journal of Statistics*12 (1): 890β940.

*arXiv:1406.1922 [Stat]*, June.

*Electronic Journal of Statistics*5: 935β80.

*The Annals of Probability*12 (4): 1167β80.

*Journal of the American Statistical Association*87 (417): 108β19.

*Statistical Applications in Genetics and Molecular Biology*4: Article32.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*65 (3): 743β58.

*The Annals of Statistics*35 (4): 1773β1801.

*Journal of Mathematical Inequalities*, no. 1: 109β14.

*The Annals of Statistics*32 (2): 656β92.

*Journal of the American Statistical Association*100 (469): 310β21.

*Journal of Computational and Graphical Statistics*25 (1): 187β208.

*Tsukuba Journal of Mathematics*8 (2): 367β76.

*Journal of the American Statistical Association*103 (481): 340β49.

*Biometrika*39 (3-4): 309β18.

*Journal of the Royal Statistical Society: Series B (Methodological)*15 (1): 125β39.

*Arkiv FΓΆr Matematik*2 (5): 423β34.

*Scandinavian Actuarial Journal*1952 (1-2): 48β60.

*Introduction to Variance Estimation*. 2nd ed. Statistics for Social and Behavioral Sciences. New York: Springer.

*Biometrika*90 (4): 831β44.

*Biometrika*94 (1): 19β35.

*Journal of Statistical Software*11 (10).

*Computational Statistics & Data Analysis*50 (11): 2987β3008.

*Journal of Statistical Software*16 (1): 1β16.

*Biometrika*101 (1): 103β20.

## No comments yet. Why not leave one?