⚠️ TODO: I play fast and loose with language about subgroups and interaction terms here. We can define each in terms of the other often, but they are not quite the same thing. Maybe this would benefit from me making that clearer.
Estimating interaction effects is hard, but also it is probably the important thing to do in any complex and/or human system. So how do we optimally trade-off answering the most specific questions with te rapidly growing expense and difficulty of experiments large enough to detect them? Also the rapidly growing number of possible interactions as problems grow.
Connection with problematic methodology, when the need for specificity manifests through researcher degrees of freedom, i.e. choosing which interactions to model post hoc.
That is, the world is probably built of hierarchical models but we do not always have the right data to identify them, or enough of it when we do.
Lots of ill-connected notes ATM.
Review of limits of heterogeneous treatment effects literature
Data requirements, false discovery. If we want to learn interaction effects from observational studies then we need heroic amounts of data, to eliminate confounders and estimate the explosion of possible terms. Does this mean that by attempting to operate this way we are implicitly demanding a surveillance state?
Classic experimental practice tries to estimate an effect, then either
- faces a thicket of onerous multiple testing challenges to do model selection to work out who it applies to, or
- applies for new funding to identify relevant subgroups with new data in a new experiment.
Can we estimate subgroups and effects simultaneously? How bad is our degrees-of-freedom situation in this case? Not clear, and I could not see an easy answer skimming the references (Foster, Taylor, and Ruberg 2011; Imai and Ratkovic 2013; Lipkovich, Dmitrienko, and B 2017; Su et al. 2009).
Conditional average treatment effect
Working out how to condition on stuff is the bread and butter of causal inference, and there are a bunch of ways to analyse it there.
If we know what interacts our model has then we are closer to learning the correct conditioning set. See external validity.
- Science in a High-Dimensional World
- The “It’s really complicated and sad” theory of obesity.
- interactions are probably always present; they just might be small — see Gwern’s Everything Is Correlated for a roundup on this theme.
the fixation on finding average effects when the structure of effect differences is what we ought to be interested in.
See Slime Mold Time Mold, Reality is Very Weird and You Need to be Prepared for That
But as we see from the history of scurvy, sometimes splitting is the right answer! In fact, there were meaningful differences in different kinds of citrus, and meaningful differences in different animals. Making a splitting argument to save a theory — “maybe our supplier switched to a different kind of citrus, we should check that out” — is a reasonable thing to do, especially if the theory was relatively successful up to that point.
Splitting is perfectly fair game, at least to an extent — doing it a few times is just prudent, though if you have gone down a dozen rabbitholes with no luck, then maybe it is time to start digging elsewhere.
Much commentary from Andrew Gelman et al on this theme. e.g. You need 16 times the sample size to estimate an interaction than to estimate a main effect (Gelman, Hill, and Vehtari 2021 ch 16.4).
The Big Data Paradox in Clinical Practice (Msaouel 2022)
The big data paradox is a real-world phenomenon whereby as the number of patients enrolled in a study increases, the probability that the confidence intervals from that study will include the truth decreases. This occurs in both observational and experimental studies, including randomized clinical trials, and should always be considered when clinicians are interpreting research data. Furthermore, as data quantity continues to increase in today’s era of big data, the paradox is becoming more pernicious. Herein, I consider three mechanisms that underlie this paradox, as well as three potential strategies to mitigate it: (1) improving data quality; (2) anticipating and modeling patient heterogeneity; (3) including the systematic error, not just the variance, in the estimation of error intervals.