Covariance estimation

Esp Gaussian

November 17, 2014 — April 26, 2023

algebra
functional analysis
Hilbert space
kernel tricks
metrics
nonparametric
regression
sparser than thou
statistics
Figure 1

Estimating the thing that is given to you by oracles in statistics homework assignments: the covariance matrix. Or, if the data is indexed by some parameter we might consider the covariance kernel. We are especially interested in this in Gaussian processes, where the covariance structure characterizes the process up to its mean.

I am not introducing a complete theory of covariance estimation here, merely some notes.

Two big data problems can arise here: large \(p\) (ambient dimension) and large \(n\) (sample size). Large \(p\) is a problem because the covariance matrix is a \(p \times p\) matrix and frequently we need to invert it to calculate some target estimand.

Often life can be made not too bad for large \(n\) with Gaussian structure because, essentially, the problem has nice nearly low rank structure.

1 Bayesian

Inverse Wishart priors. 🏗 Other?

2 Precision estimation

The workhorse of learning graphical models under linearity and Gaussianity. See precision estimation for a more complete treatment.

3 Continuous

See kernel learning.

4 Parametric

4.1 Cholesky methods

(Huang et al. 2006; Wu and Pourahmadi 2003).

4.2 on a lattice

Estimating a stationary covariance function on a regular lattice? That is a whole field of its own. Useful keywords include circulant embedding. Although strictly more general than Gaussian processes on a lattice, it is often used in that context and some extra results are on that page for now.

5 Unordered

Thanks to Rothman (2010) I now think about covariance estimates as different in ordered versus exchangeable data.

6 Sandwich estimators

Figure 2

For robust covariances of vector data. AKA Heteroskedasticity-consistent covariance estimators. Incorporating Eicker-Huber-White sandwich estimator, Andrews kernel HAC estimator, Newey-West and others. For an intro see Achim Zeileis, Open-Source Econometric Computing in R.

7 Incoming

8 Bounding by harmonic and arithmetic means

There are some known bounds for the univariate case. Wikipedia says, in Relations with the harmonic and arithmetic means that it has been shown (Mercer 2000) that for a sample \(\left\{y_i\right\}\) of positive real numbers, \[ \sigma_y^2 \leq 2 y_{\max }(A-H) \] where \(y_{\max }\) is the maximum of the sample, \(A\) is the arithmetic mean, \(H\) is the harmonic mean of the sample and \(\sigma_y^2\) is the (biased) variance of the sample. This bound has been improved, and it is known that variance is bounded by \[ \begin{gathered} \sigma_y^2 \leq \frac{y_{\max }(A-H)\left(y_{\max }-A\right)}{y_{\max }-H}, \\ \sigma_y^2 \geq \frac{y_{\min }(A-H)\left(A-y_{\min }\right)}{H-y_{\min }}, \end{gathered} \] where \(y_{\min }\) is the minimum of the sample (Sharma 2008).

Mond and Pec̆arić (1996) says

Let us define the arithmetic mean of \(A\) with weight \(w\) as \[ A_n(A ; w)=\sum_{i=1}^n w_i A_i \] and the harmonic mean of \(A\) with weight \(w\) as \[ H_n(A ; w)=\left(\sum_{i=1}^n w_i A_i^{-1}\right)^{-1} \] It is well known \([2,5]\) that \[ H_n(A ; w) \leqslant A_n(A ; w) \] Moreover, if \(A_{i j}(i, j=1, \ldots, n)\) are positive definite matrices from \(H_m\), then the following inequality is also valid [1]: \[ \frac{1}{n} \sum_{j=1}^n\left(\frac{1}{n} \sum_{i=1}^n A_{i j}^{-1}\right)^{-1} \leqslant\left[\frac{1}{n} \sum_{i=1}^n\left(\frac{1}{n} \sum_{j=1}^n A_{i j}\right)^{-1}\right]^{-1} \]

For multivariate covariance we are interested in the PSD matrix version of this.

9 References

Abrahamsen. 1997. A Review of Gaussian Random Fields and Correlation Functions.”
Anderson. 2007. Exploring the Need for Localization in Ensemble Data Assimilation Using a Hierarchical Ensemble Filter.” Physica D: Nonlinear Phenomena, Data Assimilation,.
Azizyan, Krishnamurthy, and Singh. 2015. Extreme Compressive Sampling for Covariance Estimation.” arXiv:1506.00898 [Cs, Math, Stat].
Baik, Arous, and Péché. 2005. Phase Transition of the Largest Eigenvalue for Nonnull Complex Sample Covariance Matrices.” The Annals of Probability.
Banerjee, Ghaoui, and d’Aspremont. 2008. Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.” Journal of Machine Learning Research.
Barnard, McCulloch, and Meng. 2000. Modeling Covariance Matrices in Terms of Standard Deviations and Correlations, with Application to Shrinkage.” Statistica Sinica.
Ben Arous, and Péché. 2005. Universality of Local Eigenvalue Statistics for Some Sample Covariance Matrices.” Communications on Pure and Applied Mathematics.
Bickel, and Levina. 2008. Regularized Estimation of Large Covariance Matrices.” The Annals of Statistics.
Bosq. 2002. Estimation of Mean and Covariance Operator of Autoregressive Processes in Banach Spaces.” Statistical Inference for Stochastic Processes.
Cai, Zhang, and Zhou. 2010. Optimal Rates of Convergence for Covariance Matrix Estimation.” The Annals of Statistics.
Chan, Golub, and Leveque. 1983. Algorithms for Computing the Sample Variance: Analysis and Recommendations.” The American Statistician.
Chen, Xiaohui, Xu, and Wu. 2013. Covariance and Precision Matrix Estimation for High-Dimensional Time Series.” The Annals of Statistics.
Chen, Hao, Zheng, Al Kontar, et al. 2020. “Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20.
Cook. 2018. Principal Components, Sufficient Dimension Reduction, and Envelopes.” Annual Review of Statistics and Its Application.
Cunningham, Shenoy, and Sahani. 2008. Fast Gaussian Process Methods for Point Process Intensity Estimation.” In Proceedings of the 25th International Conference on Machine Learning. ICML ’08.
Damian, Sampson, and Guttorp. 2001. Bayesian Estimation of Semi-Parametric Non-Stationary Spatial Covariance Structures.” Environmetrics.
Daniels, and Pourahmadi. 2009. Modeling Covariance Matrices via Partial Autocorrelations.” Journal of Multivariate Analysis.
Dasgupta, and Hsu. 2007. On-Line Estimation with the Multivariate Gaussian Distribution.” In Learning Theory.
Efron. 2010. Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates.” Journal of the American Statistical Association.
Fan, Liao, and Liu. 2016. An Overview of the Estimation of Large Covariance and Precision Matrices.” The Econometrics Journal.
Friedman, Hastie, and Tibshirani. 2008. Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics.
Fuentes. 2006. Testing for Separability of Spatial–Temporal Covariance Functions.” Journal of Statistical Planning and Inference.
Furrer, R., and Bengtsson. 2007. Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants.” Journal of Multivariate Analysis.
Furrer, Reinhard, Genton, and Nychka. 2006. Covariance Tapering for Interpolation of Large Spatial Datasets.” Journal of Computational and Graphical Statistics.
Gneiting, Kleiber, and Schlather. 2010. Matérn Cross-Covariance Functions for Multivariate Random Fields.” Journal of the American Statistical Association.
Goodman. 1960. On the Exact Variance of Products.” Journal of the American Statistical Association.
Hackbusch. 2015. Hierarchical Matrices: Algorithms and Analysis. Springer Series in Computational Mathematics 49.
Hansen. 2007. Generalized Least Squares Inference in Panel and Multilevel Models with Serial Correlation and Fixed Effects.” Journal of Econometrics.
Heinrich, and Podolskij. 2014. On Spectral Distribution of High Dimensional Covariation Matrices.” arXiv:1410.6764 [Math].
Huang, Liu, Pourahmadi, et al. 2006. Covariance Matrix Selection and Estimation via Penalised Normal Likelihood.” Biometrika.
James, and Stein. 1961. Estimation with Quadratic Loss.” In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability.
Janková, and van de Geer. 2015. Honest Confidence Regions and Optimality in High-Dimensional Precision Matrix Estimation.” arXiv:1507.02061 [Math, Stat].
Kauermann, and Carroll. 2001. A Note on the Efficiency of Sandwich Covariance Matrix Estimation.” Journal of the American Statistical Association.
Khoromskij, Litvinenko, and Matthies. 2009. Application of Hierarchical Matrices for Computing the Karhunen–Loève Expansion.” Computing.
Khoshgnauz. 2012. Learning Markov Network Structure Using Brownian Distance Covariance.” arXiv:1206.6361 [Cs, Stat].
Kuismin, and Sillanpää. 2017. Estimation of Covariance and Precision Matrix, Network Structure, and a View Toward Systems Biology.” WIREs Computational Statistics.
Lam, and Fan. 2009. Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.” Annals of Statistics.
Ledoit, and Wolf. 2004. A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices.” Journal of Multivariate Analysis.
Ling. 1974. Comparison of Several Algorithms for Computing Sample Means and Variances.” Journal of the American Statistical Association.
Liu, Zhan, and Niu. 2021. Hilbert–Schmidt Independence Criterion Regularization Kernel Framework on Symmetric Positive Definite Manifolds.” Mathematical Problems in Engineering.
Loh. 1991. Estimating Covariance Matrices II.” Journal of Multivariate Analysis.
Mardia, and Marshall. 1984. Maximum Likelihood Estimation of Models for Residual Covariance in Spatial Regression.” Biometrika.
Meier. 2018. A matrix Gamma process and applications to Bayesian analysis of multivariate time series.”
Meier, Kirch, and Meyer. 2020. Bayesian Nonparametric Analysis of Multivariate Time Series: A Matrix Gamma Process Approach.” Journal of Multivariate Analysis.
Meinshausen, and Bühlmann. 2006. High-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics.
Mercer. 2000. Bounds for A–G, A–H, G–H, and a Family of Inequalities of Ky Fan’s Type, Using a General Method.” Journal of Mathematical Analysis and Applications.
Minasny, and McBratney. 2005. The Matérn Function as a General Model for Soil Variograms.” Geoderma, Pedometrics 2003,.
Mond, and Pec̆arić. 1996. A Mixed Arithmetic-Mean-Harmonic-Mean Matrix Inequality.” Linear Algebra and Its Applications, Linear Algebra and Statistics: In Celebration of C. R. Rao’s 75th Birthday (September 10, 1995),.
Pébay. 2008. Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments.” Sandia Report SAND2008-6212, Sandia National Laboratories.
Pleiss, Gardner, Weinberger, et al. 2018. Constant-Time Predictive Distributions for Gaussian Processes.” In.
Pourahmadi. 2011. Covariance Estimation: The GLM and Regularization Perspectives.” Statistical Science.
Prause, and Steland. 2018. Estimation of the Asymptotic Variance of Univariate and Multivariate Random Fields and Statistical Inference.” Electronic Journal of Statistics.
Ramdas, and Wehbe. 2014. Stein Shrinkage for Cross-Covariance Operators and Kernel Independence Testing.” arXiv:1406.1922 [Stat].
Ravikumar, Wainwright, Raskutti, et al. 2011. High-Dimensional Covariance Estimation by Minimizing ℓ1-Penalized Log-Determinant Divergence.” Electronic Journal of Statistics.
Rigollet, and Hütter. 2019. High Dimensional Statistics.
Rosenblatt. 1984. Asymptotic Normality, Strong Mixing and Spectral Density Estimates.” The Annals of Probability.
Rothman. 2010. “Sparse Estimation of High-Dimensional Covariance Matrices.”
Sampson, and Guttorp. 1992. Nonparametric Estimation of Nonstationary Spatial Covariance Structure.” Journal of the American Statistical Association.
Schäfer, and Strimmer. 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.” Statistical Applications in Genetics and Molecular Biology.
Schmidt, and O’Hagan. 2003. Bayesian Inference for Non-Stationary Spatial Covariance Structure via Spatial Deformations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Shao, and Wu. 2007. Asymptotic Spectral Theory for Nonlinear Time Series.” The Annals of Statistics.
Sharma. 2008. Some More Inequalities for Arithmetic Mean, Harmonic Mean and Variance.” Journal of Mathematical Inequalities.
Shimotsu, and Phillips. 2004. Local Whittle Estimation in Nonstationary and Unit Root Cases.” The Annals of Statistics.
Stein. 2005. Space-Time Covariance Functions.” Journal of the American Statistical Association.
Sun, and Stein. 2016. Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets.” Journal of Computational and Graphical Statistics.
Takemura. 1984. An Orthogonally Invariant Minimax Estimator of the Covariance Matrix of a Multivariate Normal Population.” Tsukuba Journal of Mathematics.
Warton. 2008. Penalized Normal Likelihood and Ridge Regularization of Correlation and Covariance Matrices.” Journal of the American Statistical Association.
Whittle, Peter. 1952. Some Results in Time Series Analysis.” Scandinavian Actuarial Journal.
Whittle, P. 1952. Tests of Fit in Time Series.” Biometrika.
———. 1953a. The Analysis of Multiple Stationary Time Series.” Journal of the Royal Statistical Society: Series B (Methodological).
———. 1953b. Estimation and Information in Stationary Time Series.” Arkiv För Matematik.
Wolter. 2007. Introduction to Variance Estimation. Statistics for Social and Behavioral Sciences.
Wu, and Pourahmadi. 2003. Nonparametric Estimation of Large Covariance Matrices of Longitudinal Data.” Biometrika.
Yuan, and Lin. 2007. Model Selection and Estimation in the Gaussian Graphical Model.” Biometrika.
Zeileis. 2004. Econometric Computing with HC and HAC Covariance Matrix Estimators.” Journal of Statistical Software.
———. 2006a. Implementing a Class of Structural Change Tests: An Econometric Computing Approach.” Computational Statistics & Data Analysis.
———. 2006b. Object-Oriented Computation of Sandwich Estimators.” Journal of Statistical Software.
Zhang, and Zou. 2014. Sparse Precision Matrix Estimation via Lasso Penalized D-Trace Loss.” Biometrika.