# Large sample theory

February 16, 2015 — August 2, 2023

Gaussian
likelihood
optimization
probability
statistics

Many things are similar in the eventual limit.

under construction ⚠️: I merged two notebooks here. The seams are showing.

We use asymptotic approximations all the time in statistics, most frequently in asymptotic pivots that motivate classical tests e.g. in classical hypothesis tests or an information penalty. We use the asymptotic delta method to motivate robust statistics, or infinite neural networks. There are various specialised mechanism; I am fond of the Stein methods. Also fun, Feynman-Kac formulae give us central limit theorems for all manner of weird processes.

There is much to be said on the various central limit theorems, but I will not be the one to say it right this minute, because this is a placeholder.

A convenient feature of M-estimation, and especially maximum likelihood estimation is simple behaviour of estimators in the asymptotic large-sample-size limit, which can give you, e.g. variance estimates, or motivate information criteria, or robust statistics, optimisation etc.

In the most celebrated and convenient cases case asymptotic bounds are about normally-distributed errors, and these are typically derived through Local Asymptotic Normality theorems. A simple and general introduction is given in Andersen et al. (1997) page 594., which applies to both i.i.d. data and dependent_data in the form of point processes. For all that it is applied, it is still stringent.

In many nice distributions, central limit theorems lead (Asymptotically) to Gaussian distributions, and we can treat uncertainty in terms of transformations of Gaussians.

## 1 Bayesian posteriors are kinda gaussian

The Bayesian large sample result of notes is the Bernstein–von Mises theorem, which provides some conitions under which the posterior distribution is asymptotically Gaussian.

## 3 Fisher Information

Used in ML theory and kinda-sorta in robust estimation, and natural gradient methods. A matrix that tells is how much a new datum affects our parameter estimates. (It is related, I am told, to garden-variety Shannon information, and when that non-obvious fact is more clear to me I shall expand how precisely this is so.) 🏗

## 4 Convolution Theorem

The unhelpfully-named convolution theorem of Hájek (1970) — is that related?

Suppose $$\hat{\theta}$$ is an efficient estimator of $$\theta$$ and $$\tilde{\theta}$$ is another, not fully efficient, estimator. The convolution theorem says that, if you rule out stupid exceptions, asymptotically $$\tilde{\theta} = \hat{\theta} + \varepsilon$$ where $$\varepsilon$$ is pure noise, independent of $$\hat{\theta}.$$

The reason that’s almost obvious is that if it weren’t true, there would be some information about $$\theta$$ in $$\tilde{\theta}-\hat{\theta}$$, and you could use this information to get a better estimator than $$\hat{\theta}$$, which (by assumption) can’t happen. The stupid exceptions are things like the Hodges superefficient estimator that do better at a few values of $$\hat{\theta}$$ but much worse at neighbouring values.

## 5 References

Akaike, Hirotogu. 1973. In Proceeding of the Second International Symposium on Information Theory.
Akaike, Htrotugu. 1973. Biometrika.
Andersen, Borgan, Gill, et al. 1997. Statistical models based on counting processes. Springer series in statistics.
Athreya, K. B., and Keiding. 1977. Sankhyā: The Indian Journal of Statistics, Series A (1961-2002).
Athreya, Krishna B, and Lahiri. 2006. Measure theory and probability theory.
Bacry, Delattre, Hoffmann, et al. 2013. Stochastic Processes and Their Applications, A Special Issue on the Occasion of the 2013 International Year of Statistics,.
Barbour, and Chen, eds. 2005. An Introduction to Stein’s Method. Lecture Notes Series / Institute for Mathematical Sciences, National University of Singapore, v. 4.
Barndorff-Nielsen, and Sørensen. 1994. International Statistical Review / Revue Internationale de Statistique.
Barrio, Deheuvels, and van de Geer. 2006. Lectures on Empirical Processes: Theory and Statistical Applications.
Barron. 1986. The Annals of Probability.
Becker-Kern, Meerschaert, and Scheffler. 2004. The Annals of Probability.
Bibby, and Sørensen. 1995. Bernoulli.
Bishop, and Del Moral. 2016. SIAM Journal on Control and Optimization.
———. 2023. Mathematics of Control, Signals, and Systems.
Bréhier, Goudenège, and Tudela. 2016. In Monte Carlo and Quasi-Monte Carlo Methods. Springer Proceedings in Mathematics & Statistics.
Cantoni, and Ronchetti. 2001. Journal of the American Statistical Association.
Cérou, Le Gland, François, Del Moral, et al. 2005. In Proceedings of the Winter Simulation Conference, 2005.
Chikuse. 2003. In Statistics on Special Manifolds.
Claeskens, Krivobokova, and Opsomer. 2009. Biometrika.
DasGupta. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics.
Del Moral, Pierre. 2004. Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications.
Del Moral, Pierre, and Doucet. 2009. “Particle Methods: An Introduction with Applications.”
Del Moral, Pierre, Hu, and Wu. 2011. On the Concentration Properties of Interacting Particle Processes.
Del Moral, P., Kurtzmann, and Tugaut. 2017. SIAM Journal on Control and Optimization.
Duembgen, and Podolskij. 2015. Stochastic Processes and Their Applications.
Feigin. 1976. Advances in Applied Probability.
Feller. 1951. The Annals of Mathematical Statistics.
Fernholz. 1983. von Mises calculus for statistical functionals. Lecture Notes in Statistics 19.
Gine, and Zinn. 1990. Annals of Probability.
Giraitis, and Surgailis. 1999. “Central Limit Theorem for the Empirical Process of a Linear Sequence with Long Memory.” Journal of Statistical Planning and Inference.
Gribonval, Blanchard, Keriven, et al. 2017. arXiv:1706.07180 [Cs, Math, Stat].
Hájek. 1970. Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete.
———. 1972. In.
Heyde, and Seneta. 2010. In Selected Works of C.C. Heyde. Selected Works in Probability and Statistics.
Huber. 1964. The Annals of Mathematical Statistics.
Jacob, O’Leary, and Atchadé. 2019. arXiv:1708.03625 [Stat].
Jacod, Podolskij, and Vetter. 2010. The Annals of Statistics.
Jacod, and Shiryaev. 1987a. Limit Theorems for Stochastic Processes. Grundlehren Der Mathematischen Wissenschaften.
———. 1987b. In Limit Theorems for Stochastic Processes. Grundlehren Der Mathematischen Wissenschaften.
Janková, and van de Geer. 2016. arXiv:1610.01353 [Math, Stat].
Karabash, and Zhu. 2012. arXiv:1211.4039 [Math].
Konishi, and Kitagawa. 1996. Biometrika.
———. 2003. Journal of Statistical Planning and Inference, C.R. Rao 80th Birthday Felicitation Volume, Part IV,.
Kraus, and Panaretos. 2014. Biometrika.
Le Gland, Monbet, and Tran. 2009. Report.
LeCam. 1970. The Annals of Mathematical Statistics.
———. 1972. In.
Lederer, and van de Geer. 2014. Bernoulli.
Lorsung. 2021.
Maronna. 1976. The Annals of Statistics.
Mueller. 2018. arXiv:1802.00762 [Math].
Ogata. 1978. Annals of the Institute of Statistical Mathematics.
Pollard. 1990. Empirical Processes: Theory and Applications.
Prause, and Steland. 2018. Electronic Journal of Statistics.
Puri, and Tuan. 1986. Proceedings of the National Academy of Sciences of the United States of America.
Raginsky, and Sason. 2014. Concentration of Measure Inequalities in Information Theory, Communications, and Coding: Second Edition.
Ross. 2011. Probability Surveys.
Scornet. 2014. arXiv:1409.2090 [Math, Stat].
Shiga, and Tanaka. 1985. Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete.
Sørensen. 2000. The Econometrics Journal.
Stam. 1982. Journal of Applied Probability.
Stein. 1986. Approximate Computation of Expectations.
Tropp. 2015. An Introduction to Matrix Concentration Inequalities.
van der Vaart. 2007. Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics.