# Covariance estimation

Esp Gaussian

November 17, 2014 β April 26, 2023

Estimating the thing that is given to you by oracles in statistics homework assignments: the covariance matrix. Or, if the data is indexed by some parameter we might consider the covariance kernel. We are especially interested in this in Gaussian processes, where the covariance structure characterises the process up to its mean.

I am not introducing a complete theory of covariance estimation here, merely some notes.

Two big data problems problems can arise here: large $$p$$ (ambient dimension) and large $$n$$ (sample size). Large $$p$$ is a problem because the covariance matrix is a $$p \times p$$ matrix and frequently we need to invert it to calculate some target estimand.

Often life can be made not too bad for large $$n$$ with Gaussian structure because, essentially, the problem has nice nearly low rank structure.

## 1 Bayesian

Inverse Wishart priors. π Other?

## 2 Precision estimation

The workhorse of learning graphical models under linearity and Gaussianity. See precision estimation for a more complete treatment.

## 3 Continuous

See kernel learning.

## 4 Parametric

### 4.2 on a lattice

Estimating a stationary covariance function on a regular lattice? That is a whole field of its own. Useful keywords include circulant embedding. Although strictly more general than Gaussian processes on a lattice, it is often used in that context and some extra results are on that page for now.

## 5 Unordered

Thanks to Rothman (2010) I now think about covariance estimates as being different in ordered versus exchangeable data.

## 6 Sandwich estimators

For robust covariances of vector data. AKA Heteroskedasticity-consistent covariance estimators. Incorporating Eicker-Huber-White sandwich estimator, Andrews kernel HAC estimator, Newey-West and others. For an intro see Achim Zeileis, Open-Source Econometric Computing in R.

## 8 Bounding by harmonic and arithmetic means

There are some known bounds for the univariate case. Wikipedia says, in Relations with the harmonic and arithmetic means that it has been shown that for a sample $$\left\{y_i\right\}$$ of positive real numbers, $\sigma_y^2 \leq 2 y_{\max }(A-H)$ where $$y_{\max }$$ is the maximum of the sample, $$A$$ is the arithmetic mean, $$H$$ is the harmonic mean of the sample and $$\sigma_y^2$$ is the (biased) variance of the sample. This bound has been improved, and it is known that variance is bounded by $\begin{gathered} \sigma_y^2 \leq \frac{y_{\max }(A-H)\left(y_{\max }-A\right)}{y_{\max }-H}, \\ \sigma_y^2 \geq \frac{y_{\min }(A-H)\left(A-y_{\min }\right)}{H-y_{\min }}, \end{gathered}$ where $$y_{\min }$$ is the minimum of the sample .

Mond and PecΜariΔ (1996) says

Let us define the arithmetic mean of $$A$$ with weight $$w$$ as $A_n(A ; w)=\sum_{i=1}^n w_i A_i$ and the harmonic mean of $$A$$ with weight $$w$$ as $H_n(A ; w)=\left(\sum_{i=1}^n w_i A_i^{-1}\right)^{-1}$ It is well known $$[2,5]$$ that $H_n(A ; w) \leqslant A_n(A ; w)$ Moreover, if $$A_{i j}(i, j=1, \ldots, n)$$ are positive definite matrices from $$H_m$$, then the following inequality is also valid [1]: $\frac{1}{n} \sum_{j=1}^n\left(\frac{1}{n} \sum_{i=1}^n A_{i j}^{-1}\right)^{-1} \leqslant\left[\frac{1}{n} \sum_{i=1}^n\left(\frac{1}{n} \sum_{j=1}^n A_{i j}\right)^{-1}\right]^{-1}$

For multivariate covariance we are interested in the PSD matrix version of this.

## 9 References

Abrahamsen. 1997.
Anderson. 2007. Physica D: Nonlinear Phenomena, Data Assimilation,.
Azizyan, Krishnamurthy, and Singh. 2015. arXiv:1506.00898 [Cs, Math, Stat].
Banerjee, Ghaoui, and dβAspremont. 2008. Journal of Machine Learning Research.
Barnard, McCulloch, and Meng. 2000. Statistica Sinica.
Bickel, and Levina. 2008. The Annals of Statistics.
Bosq. 2002. Statistical Inference for Stochastic Processes.
Cai, Zhang, and Zhou. 2010. The Annals of Statistics.
Chan, Golub, and Leveque. 1983. The American Statistician.
Chen, Xiaohui, Xu, and Wu. 2013. The Annals of Statistics.
Chen, Hao, Zheng, Al Kontar, et al. 2020. βStochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes.β In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPSβ20.
Cook. 2018. Annual Review of Statistics and Its Application.
Cunningham, Shenoy, and Sahani. 2008. In Proceedings of the 25th International Conference on Machine Learning. ICML β08.
Damian, Sampson, and Guttorp. 2001. Environmetrics.
Daniels, and Pourahmadi. 2009. Journal of Multivariate Analysis.
Dasgupta, and Hsu. 2007. In Learning Theory.
Efron. 2010. Journal of the American Statistical Association.
Fan, Liao, and Liu. 2016. The Econometrics Journal.
Friedman, Hastie, and Tibshirani. 2008. Biostatistics.
Fuentes. 2006. Journal of Statistical Planning and Inference.
Furrer, R., and Bengtsson. 2007. Journal of Multivariate Analysis.
Furrer, Reinhard, Genton, and Nychka. 2006. Journal of Computational and Graphical Statistics.
Gneiting, Kleiber, and Schlather. 2010. Journal of the American Statistical Association.
Goodman. 1960. Journal of the American Statistical Association.
Hackbusch. 2015. Hierarchical Matrices: Algorithms and Analysis. Springer Series in Computational Mathematics 49.
Hansen. 2007. Journal of Econometrics.
Heinrich, and Podolskij. 2014. arXiv:1410.6764 [Math].
Huang, Liu, Pourahmadi, et al. 2006. Biometrika.
James, and Stein. 1961. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability.
JankovΓ‘, and van de Geer. 2015. arXiv:1507.02061 [Math, Stat].
Kauermann, and Carroll. 2001. Journal of the American Statistical Association.
Khoromskij, Litvinenko, and Matthies. 2009. Computing.
Khoshgnauz. 2012. arXiv:1206.6361 [Cs, Stat].
Kuismin, and SillanpΓ€Γ€. 2017. WIREs Computational Statistics.
Lam, and Fan. 2009. Annals of Statistics.
Ledoit, and Wolf. 2004. Journal of Multivariate Analysis.
Ling. 1974. Journal of the American Statistical Association.
Loh. 1991. Journal of Multivariate Analysis.
Mardia, and Marshall. 1984. Biometrika.
Meier, Kirch, and Meyer. 2020. Journal of Multivariate Analysis.
Meinshausen, and BΓΌhlmann. 2006. The Annals of Statistics.
Mercer. 2000. Journal of Mathematical Analysis and Applications.
Minasny, and McBratney. 2005. Geoderma, Pedometrics 2003,.
Mond, and PecΜariΔ. 1996. Linear Algebra and Its Applications, Linear Algebra and Statistics: In Celebration of C. R. Raoβs 75th Birthday (September 10, 1995),.
PΓ©bay. 2008. Sandia Report SAND2008-6212, Sandia National Laboratories.
Pleiss, Gardner, Weinberger, et al. 2018. In.
Prause, and Steland. 2018. Electronic Journal of Statistics.
Ramdas, and Wehbe. 2014. arXiv:1406.1922 [Stat].
Ravikumar, Wainwright, Raskutti, et al. 2011. Electronic Journal of Statistics.
Rigollet, and HΓΌtter. 2019. High Dimensional Statistics.
Rosenblatt. 1984. The Annals of Probability.
Rothman. 2010. βSparse Estimation of High-Dimensional Covariance Matrices.β
Sampson, and Guttorp. 1992. Journal of the American Statistical Association.
SchΓ€fer, and Strimmer. 2005. Statistical Applications in Genetics and Molecular Biology.
Schmidt, and OβHagan. 2003. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Shao, and Wu. 2007. The Annals of Statistics.
Sharma. 2008. Journal of Mathematical Inequalities.
Shimotsu, and Phillips. 2004. The Annals of Statistics.
Stein. 2005. Journal of the American Statistical Association.
Sun, and Stein. 2016. Journal of Computational and Graphical Statistics.
Takemura. 1984. Tsukuba Journal of Mathematics.
Warton. 2008. Journal of the American Statistical Association.
Whittle, Peter. 1952. Scandinavian Actuarial Journal.
Whittle, P. 1952. βTests of Fit in Time Series.β Biometrika.
βββ. 1953a. Journal of the Royal Statistical Society: Series B (Methodological).
βββ. 1953b. Arkiv FΓΆr Matematik.
Wolter. 2007. Introduction to Variance Estimation. Statistics for Social and Behavioral Sciences.