This page mostly exists to collect a good selection of overview statistics introductions that are not terrible.
I’m especially interested in modern fusion methods that harmonise what we would call *statistics* and *machine learning* methods, and the unnecessary terminological confusion between those systems.

Here are some recommended courses to get started if you don’t know what you’re doing.

- Larry Wasserman’s stats course
- Shalizi’s regression lectures
- Moritz Hardt, Benjamin Recht Patterns, predictions, and actions: A story about machine learning

See also the recommended texts below. May I draw your attention especially to Kroese et al. (2019), which I proof-read for my supervisor Zdravko Botev, and enjoyed greatly? It smoothly bridges non-statistics mathematicians into applied statistics, without being excruciating, unlike layperson introductions. It is now freely available online and has fewer typos.

There are also statistics podcasts.

## Taxonomies

Boaz Barak, ML Theory with bad drawings attempts one division of labour:

However, what we actually do is at least

thrice-removedfrom this ideal:

The model gap:We do not optimize over all possible systems, but rather a small subset of such systems (e.g., ones that belong to a certain family of models).The metric gap:In almost all cases, we do not optimize the actual measure of success we care about, but rather another metric that is at best correlated with it.The algorithm gap:We don’t even optimize the latter metric since it will almost always be non-convex, and hence the system we end up with depends on our starting point and the particular algorithms we use.The magic of machine learning is that sometimes (though not always!) we can still get good results despite these gaps. Much of the theory of machine learning is about understanding under what conditions can we bridge some of these gaps.

The above discussion explains the “machine Learning is just X” takes. The expressivity of our models falls under

approximation theory. The gap between the success we want to achieve and the metric we can measure often corresponds to the difference betweenpopulationandsampleperformance, which becomes a question ofstatistics. The study of our algorithms' performance falls underoptimization.

## Gotchas

Greenland (1995a); Greenland (1995b)

How important is normality of data? In many circumstances, not important because we care about normality of the sampling distribution, which is different. Lumley et al. (2002)

## References

*Data Mining*. 1st edition. Cham: Springer International Publishing.

*Theoretical Statistics*. Boca Raton: Chapman & Hall/CRC.

*Foundations of Mathematical and Computational Economics*.

*A Probabilistic Theory of Pattern Recognition*. New York: Springer.

*Computer Age Statistical Inference: Algorithms, Evidence, and Data Science*. Institute of Mathematical Statistics Monographs. New York, NY: Cambridge University Press.

*Statistical Models and Causal Inference: A Dialogue with the Social Sciences*, edited by David Collier, Jasjeet S. Sekhon, and Philip B. Stark. Cambridge: Cambridge University Press.

*Bayesian Data Analysis*. 3 edition. Chapman & Hall/CRC texts in statistical science. Boca Raton: Chapman and Hall/CRC.

*Epidemiology*6 (4): 356–65.

*Epidemiology*6 (5): 563–65.

*Journal of the Royal Statistical Society. Series D (The Statistician)*26 (2): 81–107.

*Stochastic modeling of scientific data*. 1. ed. Stochastic modeling series. London: Chapman & Hall.

*arXiv:2102.05242 [Cs, Stat]*, February.

*The Elements of Statistical Learning: Data Mining, Inference and Prediction*. Springer.

*Probability, Random Processes, and Statistical Analysis: Applications to Communications, Signal Processing, Queueing Theory and Mathematical Finance*. Cambridge University Press.

*Mathematical and Statistical Methods for Data Science and Machine Learning*. First edition. Chapman & Hall/CRC Machine Learning & Pattern Recognition. Boca Raton: CRC Press.

*Theory of point estimation*. 2nd ed. Springer texts in statistics. New York: Springer.

*Testing statistical hypotheses*. 3. ed. Springer texts in statistics. New York, NY: Springer.

*Annual Review of Public Health*23 (1): 151–69.

*Foundations of Machine Learning*. Second edition. Adaptive Computation and Machine Learning. Cambridge, Massachusetts: The MIT Press.

*Machine learning: a probabilistic perspective*. 1 edition. Adaptive computation and machine learning series. Cambridge, MA: MIT Press.

*Monte Carlo Statistical Methods*. 2nd ed. Springer Texts in Statistics. New York: Springer.

*Theory of Statistics*. Springer Series in Statistics. New York, NY: Springer Science & Business Media.

*Asymptotic statistics*. 1. paperback ed., 8. printing. Cambridge series in statistical and probabilistic mathematics. Cambridge: Cambridge Univ. Press.

*All of Statistics: A Concise Course in Statistical Inference*. Springer.

## No comments yet. Why not leave one?