Large sample theory



Many things are similar in the eventual limit.

under construction ⚠️: I merged two notebooks here. The seams are showing.

We use asymptotic approximations all the time in statistics, most frequently in asymptotic pivots that motivate classical tests e.g. in classical hypothesis tests or an information penalty. We use the asymptotic delta method to motivate robust statistics, or infinite neural networks. There are various specialised mechanism; I am fond of the Stein methods. Also fun, Feynman-Kac formulae give us central limit theorems for all manner of weird processes.

There is much to be said on the various central limit theorems, but I will not be the one to say it right this minute, because this is a placeholder.

A convenient feature of M-estimation, and especially maximum likelihood estimation is simple behaviour of estimators in the asymptotic large-sample-size limit, which can give you, e.g. variance estimates, or motivate information criteria, or robust statistics, optimisation etc.

In the most celebrated and convenient cases case asymptotic bounds are about normally-distributed errors, and these are typically derived through Local Asymptotic Normality theorems. A simple and general introduction is given in Andersen et al. (1997) page 594., which applies to both i.i.d. data and dependent_data in the form of point processes. For all that it is applied, it is still stringent.

In many nice distributions, central limit theorems lead (Asymptotically) to Gaussian distributions, and we can treat uncertainty in terms of transformations of Gaussians.

Bayesian posteriors are kinda gaussian

The Bayesian large sample result of notes is the Bernstein–von Mises theorem, which provides some conitions under which the posterior distribution is asymptotically Gaussian.

Fisher Information

Used in ML theory and kinda-sorta in robust estimation, and natural gradient methods. A matrix that tells is how much a new datum affects our parameter estimates. (It is related, I am told, to garden-variety Shannon information, and when that non-obvious fact is more clear to me I shall expand how precisely this is so.) 🏗

Convolution Theorem

The unhelpfully-named convolution theorem of Hájek (1970) — is that related?

Suppose \(\hat{\theta}\) is an efficient estimator of \(\theta\) and \(\tilde{\theta}\) is another, not fully efficient, estimator. The convolution theorem says that, if you rule out stupid exceptions, asymptotically \(\tilde{\theta} = \hat{\theta} + \varepsilon\) where \(\varepsilon\) is pure noise, independent of \(\hat{\theta}.\)

The reason that’s almost obvious is that if it weren’t true, there would be some information about \(\theta\) in \(\tilde{\theta}-\hat{\theta}\), and you could use this information to get a better estimator than \(\hat{\theta}\), which (by assumption) can’t happen. The stupid exceptions are things like the Hodges superefficient estimator that do better at a few values of \(\hat{\theta}\) but much worse at neighbouring values.

References

Akaike, Hirotogu. 1973. Information Theory and an Extension of the Maximum Likelihood Principle.” In Proceeding of the Second International Symposium on Information Theory, edited by Petrovand F Caski, 199–213. Budapest: Akademiai Kiado.
Akaike, Htrotugu. 1973. Maximum Likelihood Identification of Gaussian Autoregressive Moving Average Models.” Biometrika 60 (2): 255–65.
Andersen, Per Kragh, Ornulf Borgan, Richard D. Gill, and Niels Keiding. 1997. Statistical models based on counting processes. Corr. 2. print. Springer series in statistics. New York, NY: Springer.
Athreya, K. B., and Niels Keiding. 1977. Estimation Theory for Continuous-Time Branching Processes.” Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 39 (2): 101–23.
Athreya, Krishna B, and S. N Lahiri. 2006. Measure theory and probability theory. New York: Springer.
Bacry, E., S. Delattre, M. Hoffmann, and J. F. Muzy. 2013. Some Limit Theorems for Hawkes Processes and Application to Financial Statistics.” Stochastic Processes and Their Applications, A Special Issue on the Occasion of the 2013 International Year of Statistics, 123 (7): 2475–99.
Barbour, A. D., and Louis H. Y. Chen, eds. 2005. An Introduction to Stein’s Method. Vol. 4. Lecture Notes Series / Institute for Mathematical Sciences, National University of Singapore, v. 4. Singapore : Hackensack, N.J: Singapore University Press ; World Scientific.
Barndorff-Nielsen, O. E., and M. Sørensen. 1994. A Review of Some Aspects of Asymptotic Likelihood Theory for Stochastic Processes.” International Statistical Review / Revue Internationale de Statistique 62 (1): 133–65.
Barrio, Eustasio del, Paul Deheuvels, and Sara van de Geer. 2006. Lectures on Empirical Processes: Theory and Statistical Applications. European Mathematical Society.
Barron, Andrew R. 1986. Entropy and the Central Limit Theorem.” The Annals of Probability 14 (1): -336-342.
Becker-Kern, Peter, Mark M. Meerschaert, and Hans-Peter Scheffler. 2004. Limit Theorems for Coupled Continuous Time Random Walks.” The Annals of Probability 32 (1): 730–56.
Bibby, Bo Martin, and Michael Sørensen. 1995. Martingale Estimation Functions for Discretely Observed Diffusion Processes.” Bernoulli 1 (1/2): 17–39.
Bishop, Adrian N., and Pierre Del Moral. 2016. On the Stability of Kalman-Bucy Diffusion Processes.” SIAM Journal on Control and Optimization 55 (6): 4015–47.
———. 2023. On the Mathematical Theory of Ensemble (Linear-Gaussian) Kalman-Bucy Filtering.” Mathematics of Control, Signals, and Systems, May.
Bréhier, Charles-Edouard, Ludovic Goudenège, and Loïc Tudela. 2016. Central Limit Theorem for Adaptive Multilevel Splitting Estimators in an Idealized Setting.” In Monte Carlo and Quasi-Monte Carlo Methods, edited by Ronald Cools and Dirk Nuyens, 163:245–60. Springer Proceedings in Mathematics & Statistics. Cham: Springer International Publishing.
Cantoni, Eva, and Elvezio Ronchetti. 2001. Robust Inference for Generalized Linear Models.” Journal of the American Statistical Association 96 (455): 1022–30.
Cérou, Frédéric, F. Le Gland, François, P. Del Moral, and P. Lezaud. 2005. Limit Theorems for the Multilevel Splitting Algorithm in the Simulation of Rare Events.” In Proceedings of the Winter Simulation Conference, 2005.
Chikuse, Yasuko. 2003. High Dimensional Asymptotic Theorems.” In Statistics on Special Manifolds, by Yasuko Chikuse, 174:187–230. New York, NY: Springer New York.
Claeskens, Gerda, Tatyana Krivobokova, and Jean D. Opsomer. 2009. Asymptotic Properties of Penalized Spline Estimators.” Biometrika 96 (3): 529–44.
DasGupta, Anirban. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics. New York: Springer New York.
Del Moral, Pierre. 2004. Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications. 2004 edition. Latheronwheel, Caithness: Springer.
Del Moral, Pierre, and Arnaud Doucet. 2009. “Particle Methods: An Introduction with Applications.” INRIA.
Del Moral, Pierre, Peng Hu, and Liming Wu. 2011. On the Concentration Properties of Interacting Particle Processes. Vol. 3. Now Publishers.
Del Moral, P., A. Kurtzmann, and J. Tugaut. 2017. On the Stability and the Uniform Propagation of Chaos of a Class of Extended Ensemble Kalman-Bucy Filters.” SIAM Journal on Control and Optimization 55 (1): 119–55.
Duembgen, Moritz, and Mark Podolskij. 2015. High-Frequency Asymptotics for Path-Dependent Functionals of Itô Semimartingales.” Stochastic Processes and Their Applications 125 (4): 1195–1217.
Feigin, Paul David. 1976. Maximum Likelihood Estimation for Continuous-Time Stochastic Processes.” Advances in Applied Probability 8 (4): 712–36.
Feller, William. 1951. The Asymptotic Distribution of the Range of Sums of Independent Random Variables.” The Annals of Mathematical Statistics 22 (3): -427-432.
Fernholz, Luisa Turrin. 1983. von Mises calculus for statistical functionals. Lecture Notes in Statistics 19. New York: Springer.
Gine, Evarist, and Joel Zinn. 1990. Bootstrapping General Empirical Measures.” Annals of Probability 18 (2): 851–69.
Giraitis, L, and D Surgailis. 1999. “Central Limit Theorem for the Empirical Process of a Linear Sequence with Long Memory.” Journal of Statistical Planning and Inference 80 (1-2): 81–93.
Gribonval, Rémi, Gilles Blanchard, Nicolas Keriven, and Yann Traonmilin. 2017. Compressive Statistical Learning with Random Feature Moments.” arXiv:1706.07180 [Cs, Math, Stat], June.
Hájek, Jaroslav. 1970. A Characterization of Limiting Distributions of Regular Estimates.” Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete 14 (4): 323–30.
———. 1972. Local Asymptotic Minimax and Admissibility in Estimation.” In. The Regents of the University of California.
Heyde, C. C., and E. Seneta. 2010. Estimation Theory for Growth and Immigration Rates in a Multiplicative Process.” In Selected Works of C.C. Heyde, edited by Ross Maller, Ishwar Basawa, Peter Hall, and Eugene Seneta, 214–35. Selected Works in Probability and Statistics. Springer New York.
Huber, Peter J. 1964. Robust Estimation of a Location Parameter.” The Annals of Mathematical Statistics 35 (1): 73–101.
Jacob, Pierre E., John O’Leary, and Yves F. Atchadé. 2019. Unbiased Markov Chain Monte Carlo with Couplings.” arXiv:1708.03625 [Stat], July.
Jacod, Jean, Mark Podolskij, and Mathias Vetter. 2010. Limit Theorems for Moving Averages of Discretized Processes Plus Noise.” The Annals of Statistics 38 (3): 1478–1545.
Jacod, Jean, and Albert N. Shiryaev. 1987a. Limit Theorems for Stochastic Processes. Vol. 288. Grundlehren Der Mathematischen Wissenschaften. Berlin, Heidelberg: Springer Berlin Heidelberg.
———. 1987b. The General Theory of Stochastic Processes, Semimartingales and Stochastic Integrals.” In Limit Theorems for Stochastic Processes, edited by Jean Jacod and Albert N. Shiryaev, 1–63. Grundlehren Der Mathematischen Wissenschaften. Berlin, Heidelberg: Springer Berlin Heidelberg.
Janková, Jana, and Sara van de Geer. 2016. Confidence Regions for High-Dimensional Generalized Linear Models Under Sparsity.” arXiv:1610.01353 [Math, Stat], October.
Karabash, Dmytro, and Lingjiong Zhu. 2012. Limit Theorems for Marked Hawkes Processes with Application to a Risk Model.” arXiv:1211.4039 [Math], November.
Konishi, Sadanori, and Genshiro Kitagawa. 1996. Generalised Information Criteria in Model Selection.” Biometrika 83 (4): 875–90.
———. 2003. Asymptotic Theory for Information Criteria in Model Selection—Functional Approach.” Journal of Statistical Planning and Inference, C.R. Rao 80th Birthday Felicitation Volume, Part IV, 114 (1–2): 45–61.
Kraus, Andrea, and Victor M. Panaretos. 2014. Frequentist Estimation of an Epidemic’s Spreading Potential When Observations Are Scarce.” Biometrika 101 (1): 141–54.
Le Gland, François, Valerie Monbet, and Vu-Duc Tran. 2009. Large Sample Asymptotics for the Ensemble Kalman Filter,” 25.
LeCam, L. 1970. On the Assumptions Used to Prove Asymptotic Normality of Maximum Likelihood Estimates.” The Annals of Mathematical Statistics 41 (3): 802–28.
———. 1972. Limits of Experiments.” In. The Regents of the University of California.
Lederer, Johannes, and Sara van de Geer. 2014. New Concentration Inequalities for Suprema of Empirical Processes.” Bernoulli 20 (4): 2020–38.
Lorsung, Cooper. 2021. Understanding Uncertainty in Bayesian Deep Learning.” arXiv.
Maronna, Ricardo Antonio. 1976. Robust M-Estimators of Multivariate Location and Scatter.” The Annals of Statistics 4 (1): 51–67.
Mueller, Ulrich K. 2018. Refining the Central Limit Theorem Approximation via Extreme Value Theory.” arXiv:1802.00762 [Math], February.
Oertel, Frank. 2020. Grothendieck’s Inequality and Completely Correlation Preserving Functions – a Summary of Recent Results and an Indication of Related Research Problems.” arXiv.
Ogata, Yoshiko. 1978. The Asymptotic Behaviour of Maximum Likelihood Estimators for Stationary Point Processes.” Annals of the Institute of Statistical Mathematics 30 (1): 243–61.
Pollard, David. 1990. Empirical Processes: Theory and Applications. IMS.
Prause, Annabel, and Ansgar Steland. 2018. Estimation of the Asymptotic Variance of Univariate and Multivariate Random Fields and Statistical Inference.” Electronic Journal of Statistics 12 (1): 890–940.
Puri, Madan L., and Pham D. Tuan. 1986. Maximum Likelihood Estimation for Stationary Point Processes.” Proceedings of the National Academy of Sciences of the United States of America 83 (3): 541–45.
Raginsky, Maxim, and Igal Sason. 2014. Concentration of Measure Inequalities in Information Theory, Communications, and Coding: Second Edition. Now Publishers.
Ross, Nathan. 2011. Fundamentals of Stein’s Method.” Probability Surveys 8 (0): 210–93.
Scornet, Erwan. 2014. On the Asymptotics of Random Forests.” arXiv:1409.2090 [Math, Stat], September.
Shiga, Tokuzo, and Hiroshi Tanaka. 1985. Central Limit Theorem for a System of Markovian Particles with Mean Field Interactions.” Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete 69 (3): 439–59.
Sørensen, Michael. 2000. Prediction-Based Estimating Functions.” The Econometrics Journal 3 (2): 123–47.
Stam, A. J. 1982. Limit Theorems for Uniform Distributions on Spheres in High-Dimensional Euclidean Spaces.” Journal of Applied Probability 19 (1): 221–28.
Stein, Charles. 1986. Approximate Computation of Expectations. Vol. 7. IMS.
Tropp, Joel A. 2015. An Introduction to Matrix Concentration Inequalities.
Vaart, Aad W. van der. 2007. Asymptotic statistics. 1. paperback ed., 8. printing. Cambridge series in statistical and probabilistic mathematics. Cambridge: Cambridge Univ. Press.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.