# Large sample theory

Many things are similar in the eventual limit. under construction ⚠️: I merged two notbooks here. The seams are showing.

We use asymptotic approximations all the time in statistics, most frequently in asymptotic pivots that motivate classical tests e.g. in classical hypothesis tests or an information penalty. We use the asymptotic delta method to motivate robust statistics, or infinite neural networks. There are various specialised mechanism; I am fond of the Stein methods. Also fun, Feynman-Kac formulae give us central limit theorems for all manner of weird processes.

There is much to be said on the various central limit theorems, but I will not be the one to say it right this minute, because this is a placeholder.

A convenient feature of M-estimation, and especially maximum likelihood esteimation is simple behaviour of estimators in the asymptotic large-sample-size limit, which can give you, e.g. variance estimates, or motivate information criteria, or robust statistics, optimisation etc.

In the most celebrated and convenient cases case asymptotic bounds are about normally-distributed errors, and these are typically derived through Local Asymptotic Normality theorems. A simple and general introduction is given in Andersen et al. (1997) page 594., which applies to both i.i.d. data and dependent_data in the form of point processes. For all that it is applied, it is still stringent.

In many nice distributions, central limit theorems lead (Asymptotically) to Gaussian distributions, and we can treat uncertainty in terms of transforamtions of Gaussians.

## Fisher Information

Used in ML theory and kinda-sorta in robust estimation, and natural gradient methods. A matrix that tells is how much a new datum affects our parameter estimates. (It is related, I am told, to garden-variety Shannon information, and when that non-obvious fact is more clear to me I shall expand how precisely this is so.) 🏗

## Convolution Theorem

The unhelpfully-named convolution theorem of Hájek (1970) — is that related?

Suppose $$\hat{\theta}$$ is an efficient estimator of $$\theta$$ and $$\tilde{\theta}$$ is another, not fully efficient, estimator. The convolution theorem says that, if you rule out stupid exceptions, asymptotically $$\tilde{\theta} = \hat{\theta} + \varepsilon$$ where $$\varepsilon$$ is pure noise, independent of $$\hat{\theta}.$$

The reason that’s almost obvious is that if it weren’t true, there would be some information about $$\theta$$ in $$\tilde{\theta}-\hat{\theta}$$, and you could use this information to get a better estimator than $$\hat{\theta}$$, which (by assumption) can’t happen. The stupid exceptions are things like the Hodges superefficient estimator that do better at a few values of $$\hat{\theta}$$ but much worse at neighbouring values.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.