Large sample theory

Delta methods, influence functions, and so on. Convolution theorems, local asymptotic minimax theorems.

A convenient feature of M-estimation, and especially maximum likelihood esteimation is simple behaviour of estimators in the asymptotic large-sample-size limit, which can give you, e.g. variance estimates, or motivate information criteria, or robust statistics, optimisation etc.

In the most celebrated and convenient cases case asymptotic bounds are about normally-distributed errors, and these are typically derived through Local Asymptotic Normality theorems. A simple and general introduction is given in Andersen et al. (1997) page 594., which applies to both i.i.d. data and dependent_data in the form of point processes. For all that it is applied, it is still stringent.

Fisher Information

Used in ML theory and kinda-sorta in robust estimation. A matrix that tells you how much a new datum affects your parameter estimates. (It is related, I am told, to garden variety Shannon information, and when that non-obvious fact is more clear to me I shall expand how precisely this is so.) 🏗

Convolution Theorem

The unhelpfully-named convolution theorem of Hájek (1970).

Suppose \(\hat{\theta}\) is an efficient estimator of \(\theta\) and \(\tilde{\theta}\) is another, not fully efficient, estimator. The convolution theorem says that, if you rule out stupid exceptions, asymptotically \(\tilde{\theta} = \hat{\theta} + \varepsilon\) where \(\varepsilon\) is pure noise, independent of \(\hat{\theta}.\)

The reason that’s almost obvious is that if it weren’t true, there would be some information about \(\theta\) in \(\tilde{\theta}-\hat{\theta}\), and you could use this information to get a better estimator than \(\hat{\theta}\), which (by assumption) can’t happen. The stupid exceptions are things like the Hodges superefficient estimator that do better at a few values of \(\hat{\theta}\) but much worse at neighbouring values.

Andersen, Per Kragh, Ornulf Borgan, Richard D. Gill, and Niels Keiding. 1997. Statistical Models Based on Counting Processes. Corr. 2. print. Springer Series in Statistics. New York, NY: Springer.

Athreya, K. B., and Niels Keiding. 1977. “Estimation Theory for Continuous-Time Branching Processes.” Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 39 (2): 101–23. http://www.jstor.org/stable/25050084.

Barndorff-Nielsen, O. E., and M. Sørensen. 1994. “A Review of Some Aspects of Asymptotic Likelihood Theory for Stochastic Processes.” International Statistical Review / Revue Internationale de Statistique 62 (1): 133–65. https://doi.org/10.2307/1403550.

Becker-Kern, Peter, Mark M. Meerschaert, and Hans-Peter Scheffler. 2004. “Limit Theorems for Coupled Continuous Time Random Walks.” The Annals of Probability 32 (1): 730–56. http://www.stt.msu.edu/~mcubed/CoupleCTRW.pdf.

Bibby, Bo Martin, and Michael Sørensen. 1995. “Martingale Estimation Functions for Discretely Observed Diffusion Processes.” Bernoulli 1 (1/2): 17–39. https://doi.org/10.2307/3318679.

DasGupta, Anirban. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics. New York: Springer New York. http://link.springer.com/10.1007/978-0-387-75971-5.

Duembgen, Moritz, and Mark Podolskij. 2015. “High-Frequency Asymptotics for Path-Dependent Functionals of Itô Semimartingales.” Stochastic Processes and Their Applications 125 (4): 1195–1217. https://doi.org/10.1016/j.spa.2014.08.007.

Feigin, Paul David. 1976. “Maximum Likelihood Estimation for Continuous-Time Stochastic Processes.” Advances in Applied Probability 8 (4): 712–36. https://doi.org/10.2307/1425931.

Gribonval, Rémi, Gilles Blanchard, Nicolas Keriven, and Yann Traonmilin. 2017. “Compressive Statistical Learning with Random Feature Moments,” June. http://arxiv.org/abs/1706.07180.

Hájek, Jaroslav. 1972. “Local Asymptotic Minimax and Admissibility in Estimation.” In. The Regents of the University of California. https://projecteuclid.org/euclid.bsmsp/1200514092.

———. 1970. “A Characterization of Limiting Distributions of Regular Estimates.” Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete 14 (4): 323–30. https://doi.org/10.1007/BF00533669.

Heyde, C. C., and E. Seneta. 2010. “Estimation Theory for Growth and Immigration Rates in a Multiplicative Process.” In Selected Works of C.C. Heyde, edited by Ross Maller, Ishwar Basawa, Peter Hall, and Eugene Seneta, 214–35. Selected Works in Probability and Statistics. Springer New York. http://link.springer.com/chapter/10.1007/978-1-4419-5823-5_31.

Jacod, Jean, Mark Podolskij, and Mathias Vetter. 2010. “Limit Theorems for Moving Averages of Discretized Processes Plus Noise.” The Annals of Statistics 38 (3): 1478–1545. https://doi.org/10.1214/09-AOS756.

Jacod, Jean, and Albert N. Shiryaev. 1987. Limit Theorems for Stochastic Processes. Vol. 288. Grundlehren Der Mathematischen Wissenschaften. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-662-02514-7.

Janková, Jana, and Sara van de Geer. 2016. “Confidence Regions for High-Dimensional Generalized Linear Models Under Sparsity,” October. http://arxiv.org/abs/1610.01353.

Konishi, Sadanori, and Genshiro Kitagawa. 1996. “Generalised Information Criteria in Model Selection.” Biometrika 83 (4): 875–90. https://doi.org/10.1093/biomet/83.4.875.

———. 2003. “Asymptotic Theory for Information Criteria in Model Selection—Functional Approach.” Journal of Statistical Planning and Inference, C.R. Rao 80th Birthday Felicitation vol., Part IV, 114 (1–2): 45–61. https://doi.org/10.1016/S0378-3758(02)00462-7.

Kraus, Andrea, and Victor M. Panaretos. 2014. “Frequentist Estimation of an Epidemic’s Spreading Potential When Observations Are Scarce.” Biometrika 101 (1): 141–54. https://doi.org/10.1093/biomet/ast049.

LeCam, L. 1972. “Limits of Experiments.” In. The Regents of the University of California. https://projecteuclid.org/euclid.bsmsp/1200514095.

———. 1970. “On the Assumptions Used to Prove Asymptotic Normality of Maximum Likelihood Estimates.” The Annals of Mathematical Statistics 41 (3): 802–28. https://doi.org/10.1214/aoms/1177696960.

Lederer, Johannes, and Sara van de Geer. 2014. “New Concentration Inequalities for Suprema of Empirical Processes.” Bernoulli 20 (4): 2020–38. https://doi.org/10.3150/13-BEJ549.

Ogata, Yoshiko. 1978. “The Asymptotic Behaviour of Maximum Likelihood Estimators for Stationary Point Processes.” Annals of the Institute of Statistical Mathematics 30 (1): 243–61. https://doi.org/10.1007/BF02480216.

Puri, Madan L., and Pham D. Tuan. 1986. “Maximum Likelihood Estimation for Stationary Point Processes.” Proceedings of the National Academy of Sciences of the United States of America 83 (3): 541–45. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC322899/.

Sørensen, Michael. 2000. “Prediction-Based Estimating Functions.” The Econometrics Journal 3 (2): 123–47. http://www.jstor.org/stable/23114885.

Tropp, Joel A. 2015. “An Introduction to Matrix Concentration Inequalities,” January. http://arxiv.org/abs/1501.01571.