How do I introduce people ot statistics/data science/analytics? What is the most punchy, most efficient modern curriculum?
I do not mean “measure theoretic probability” but rather “intuition-building introductions to the data-driven project.”
One incredible project here is Hubbard (2014), the book by Douglas Hubbard which reframes all the traditional statistics in terms of measuring things. He then compresses an incredible amount of medium-to advanced methodology into some excel spreadsheets. The art here is he gets lots of mileage out of statistical tricks that are usually emphasised for not being mathematically lavish enough to still make good exam questions.
The Curious Journalist’s Guide to Data By Jonathan Stray.
This is a book about the principles behind data journalism. Not what visualization software to use and how to scrape a website, but the fundamental ideas that underlie the human use of data. This isn’t “how to use data” but “how data works.”
This gets into some of the mathy parts of statistics, but also the difficulty of taking a census of race and the cognitive psychology of probabilities. It traces where data comes from, what journalists do with it, and where it goes after—and tries to understand the possibilities and limitations. Data journalism is as interdisciplinary as it gets, which can make it difficult to assemble all the pieces you need. This is one attempt. This is a technical book, and uses standard technical language, but all mathematical concepts are explained through pictures and examples rather than formulas.
The life of data has three parts: quantification, analysis, and communication. Quantification is the process that creates data. Analysis involves rearranging the data or combining it with other information to produce new knowledge. And none of this is useful without communicating the result.
Carl T. Bergstrom and Jevin West, in Calling bullshit: Data Reasoning in a Digital World have excellent framing and a wide syllabus of different types of bullshit curation.
See statistical tests. My question here is: do I need to teach this? Is it ever what my students actually need?
Jonas Kristoffer Lindeløv explains classic statistical tests as linear regressions: Common statistical tests are linear models.
If so, what kind of hypothesis testing do we want? Wilcoxon Mann-Whitney and Kruskal-Wallis tests are neat. Are they simpler than t-testing?
Can I simply teach everything via the bootstrap?
Memes, puns and cartoons
Actual stats courses
Teaching for “hackers” is one school where we attempt to give coders stats skills by leveraging thier coding skills. I think there is some intersting stuff to be done here, because coding can get you to lots of the same place as matheamtics. Cameron Davidson-Pilon, Probabilistic Programming & Bayesian Methods for Hackers (source
There are some more classical ones online, freely available.
Mine Çetinkaya-Rundel and Johanna Hardin, Introduction to Modern Statistics
publishes university-level texts in statistics, data science, modeling, and scientific computing.
There are also some topic-specific guides I think are worth looking at:
Philosophical / general
- Classic book on measurement: Douglas Hubbard, How to measure anything.
- Carl T. Bergstrom and Jevin West, in Calling bullshit: Data Reasoning in a Digital World have excellent framing and a wide syllabus of different types of bullshit curation.
- Jonathan Stray, The Curious Journalist’s Guide to Data
- Cathy O’Neil, Weapons of Math Destruction is a guide to how the methods we are learning are being abused
- Daniel T. Kaplan’s guide to computational calculus teaches you how to cheat at calculus.
- A/A Testing: How I increased conversions 300% by doing absolutely nothing.
- Lucile Lu, Robert Chang and Dmitriy Ryaboy of Twitter have a practical guide to risky testing at scale: Power, minimal detectable effect, and bucket size estimation in A/B tests
- Jonas Kristoffer Lindeløv: Common statistical tests are linear models. tl;dr classic statistical tests are linear regressions where your goal decide if a coefficient should be regarded as non-zero or not.
- Daniel Lakens offers a free short online course: Improving your statistical questions
- Daniel T. Kaplan’s Statistical Modeling: A Fresh Approach has nice illustrations of resampling.
- Cosma Rohilla Shalizi, Advanced Data Analysis from an Elementary Point of View (entire book free online).
- Bradley Efron and Trevor Hastie, Computer Age Statistical inference (entire book free online)
There are various tools and tutorials here
- Looking for best ways in teaching R to absolute beginners - Teaching - RStudio Community
learnr: Interactive Tutorials for R
One can also simply use one of the pre-made courses.
- Cosma’s links, targetted more to students committed to being statisticians.