Tests, statistical

The mathematics of the last century worth of experiment design.

Compare with multiple testing

Probably the least sexy thing in statistics and as such, usually taught by the least interesting professor in the department, or at least one who couldn’t find an interesting enough excuse to get out of it, which is a fair indication. Said professor will then teach it to you as if you were in turn the least interesting student in the school, and so it goes on.

This is unfair, because it turns out to be elegant and powerful tool if you can move past block- and combinatorial design stamp collecting, which few classes do, because it is the easiest way to fill in those long lecture hours.


  • Daniel Lakens, Do You Really Want to Test a Hypothesis?

  • tea-lang is a compiler for statistical tests.

    Tea is a domain specific programming language that automates statistical test selection and execution… Users provide 5 pieces of information:

    • the dataset of interest,
    • the variables in the dataset they want to analyze,
    • the study design (e.g., independent, dependent variables),
    • the assumptions they make about the data based on domain knowledge(e.g., a variable is normally distributed), and
    • a hypothesis.

    Tea then "compiles" these into logical constraints to select valid statistical tests. Tests are considered valid if and only if all the assumptions they make about the data (e.g., normal distribution, equal variance between groups, etc.) hold. Tea then finally executes the valid tests.

  • Jonas Kristoffer Lindeløv’s anti stamp-collecting prescription unifies a lot of the classic tests: Common statistical tests are linear models

  • Lucile L, Robert Chang and Dmitriy Ryaboy of Twitter have a practical guide to risky testing at scale: Power, minimal detectable effect, and bucket size estimation in A/B tests

  • Bob Sturm’s neat take, from Bailey, R. A. (2008). Design of Comparative Experiments. Cambridge; New York: Cambridge University Press.