Interaction effects and subgroups are probably what we want to estimate

January 25, 2022 — November 6, 2022

algebra
graphical models
how do science
machine learning
meta learning
networks
probability
statistics
Figure 1

I play fast and loose with language about subgroups and interaction terms here. We can define each in terms of the other often, but they are not quite the same thing.

Estimating interaction effects is hard, but it is probably the most important thing to do in any complex and/or human system. So how do we optimally trade-off answering the most specific questions with the rapidly growing expense and difficulty of experiments large enough to detect them? Also, the rapidly growing number of possible interactions as problems grow.

Connection with problematic methodology, when the need for specificity manifests through researcher degrees of freedom, i.e. choosing which interactions to model post hoc.

That is, the world is probably built of hierarchical models but we do not always have the right data to identify them, or enough of it when we do.

Lots of ill-connected notes ATM.

Figure 2

1 Review of limits of heterogeneous treatment effects literature

Data requirements, false discovery. If we want to learn interaction effects from observational studies then we need heroic amounts of data to eliminate confounders and estimate the explosion of possible terms. Does this mean that by attempting to operate this way we are implicitly demanding a surveillance state?

2 Subgroup identification

Classic experimental practice tries to estimate an effect, then either

  1. faces a thicket of onerous multiple testing challenges to do model selection to work out who it applies to, or
  2. applies for new funding to identify relevant subgroups with new data in a new experiment.

Can we estimate subgroups and effects simultaneously? How bad is our degrees-of-freedom situation in this case? Not clear, and I could not see an easy answer skimming the references (Foster, Taylor, and Ruberg 2011; Imai and Ratkovic 2013; Lipkovich, Dmitrienko, and B 2017; Su et al. 2009).

3 Conditional average treatment effect

Working out how to condition on stuff is the bread and butter of causal inference, and there are a bunch of ways to analyse it there.

4 As transferability

If we know what interacts with our model, then we are closer to learning the correct conditioning set. See external validity.

5 Ontological context

6 Scientific context

Over at social psychology, I’ve wondered about Peter Dorman’s comment:

the fixation on finding average effects when the structure of effect differences is what we ought to be interested in.

See Slime Mold Time Mold, Reality is Very Weird and You Need to be Prepared for That

But as we see from the history of scurvy, sometimes splitting is the right answer! In fact, there were meaningful differences in different kinds of citrus, and meaningful differences in different animals. Making a splitting argument to save a theory — “maybe our supplier switched to a different kind of citrus, we should check that out” — is a reasonable thing to do, especially if the theory was relatively successful up to that point.

Splitting is perfectly fair game, at least to an extent — doing it a few times is just prudent, though if you have gone down a dozen rabbitholes with no luck, then maybe it is time to start digging elsewhere.

Much commentary from Andrew Gelman et al on this theme. e.g. You need 16 times the sample size to estimate an interaction than to estimate a main effect (Gelman, Hill, and Vehtari 2021 ch 16.4).

C&C Epstein Barr and the Cause of Cause

Miller (2013) writes about basic data hygiene in this light for data journalists etc.

7 Spicy take: actually, how about optimal int

8 Social context

8.1 Is this what intersectionality means?

A real question. If we are concerned with the inequality, then there is an implied graphical model which produces as outputs different outcomes based on who is being modelled, and these will have implications with regard to fairness.

It turns out people have engaged meaningfully in this. Bright, Malinsky, and Thompson (2016) suggests some testable models:

The first claim from within intersectional theory we explicate is what we call ‘nonadditive intersectionality’, which is the claim that somebody’s intersectional identity can influence his or her life more than one would realize if one considered each of the identity categories separately ðWeldon 2006; Hancock 2007; Bowleg 2008Þ. We interpret this as meaning that some causal effects of belonging to multiple identity categories are stronger than one might have predicted from information about the causal effect of belonging to each identity category considered separately. Take, for instance, the claim made here: “In some cases the negative effects of racism and sexism might multiply each other, rendering women of color most disadvantaged on a dependent variable (e.g., income Cole 2009, 177). There is already a causal effect of being a woman on one’s income, and likewise there is already a causal effect of being a person of color. The intersectional phenomenon Cole reports is that occupying the intersectional identity of being a woman of color serves to amplify these causal processes.

The second claim to be explicated is what we call ‘switch intersectionality’. Such claims describe causal relationships that are activated only for individuals who occupy the intersection of certain identity positions. Consider, for instance, the following point from Dotson 2014, 52: “There exists a tendency to theoretically erase the experiences of oppression that are invoked as a result of being black women and not merely being black or a woman.” We believe it is consistent with the author’s intentions to say that combating this tendency involves acknowledging that the fact that a person is a black woman, rather than black or a woman considered singularly, causes her to undergo certain experiences. We will provide an analysis of switch intersectionality, which is to say causal processes that are activated only when the individuals under study occupy particular intersections of demographic categories.

8.2 The advice you found is probably not for you

Every pundit has a model for what the typical member of the public thinks, and directs their advice accordingly. For many reasons, the pundit’s model is likely to be wrong. The readers of various pundits are a self-selecting sample, and the pundit’s intuitive model of society is distorted and even if they surveyed their readership, it is hard to use that to know anything truly about the readership.

So all advice like “People should do more X” is suspect, because the advice is based on the author’s assumption that the readers are in class A but they in fact could easily be in class B, who maybe should do less X, possibly because X does not work for class B people in general, or because class B people are generally likely to have done too much X and maybe need to lay off the X for a while. See adverse advice selection.

9 Incoming

Kernel tricks for detecting 2 way interactions: Agrawal et al. (2019);Agrawal and Broderick (2021) See Tamara Broderick present this.

  • The Big Data Paradox in Clinical Practice (Msaouel 2022)

    The big data paradox is a real-world phenomenon whereby as the number of patients enrolled in a study increases, the probability that the confidence intervals from that study will include the truth decreases. This occurs in both observational and experimental studies, including randomised clinical trials, and should always be considered when clinicians are interpreting research data. Furthermore, as data quantity continues to increase in today’s era of big data, the paradox is becoming more pernicious. Herein, I consider three mechanisms that underlie this paradox, as well as three potential strategies to mitigate it: (1) improving data quality; (2) anticipating and modelling patient heterogeneity; (3) including the systematic error, not just the variance, in the estimation of error intervals.

  • The Limits Of Medicine - Part 1 - Small Molecules

10 References

Agrawal, and Broderick. 2021. The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time.” arXiv:2106.12408 [Stat].
Agrawal, Trippe, Huggins, et al. 2019. The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions.” In Proceedings of the 36th International Conference on Machine Learning.
Athey, Tibshirani, and Wager. 2019. Generalized Random Forests.” Annals of Statistics.
Bright, Malinsky, and Thompson. 2016. Causally Interpreting Intersectionality Theory.” Philosophy of Science.
DiTraglia, Garcia-Jimeno, O’Keeffe-O’Donovan, et al. 2020. Identifying Causal Effects in Experiments with Social Interactions and Non-Compliance.” arXiv:2011.07051 [Econ, Stat].
Foster, Taylor, and Ruberg. 2011. Subgroup Identification from Randomized Clinical Trial Data.” Statistics in Medicine.
Gelman, Hill, and Vehtari. 2021. Regression and other stories.
Gigerenzer. n.d. We Need to Think More about How We Conduct Research.” Behavioral and Brain Sciences.
Imai, and Ratkovic. 2013. Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation.” The Annals of Applied Statistics.
Kemp, Tenenbaum, Niyogi, et al. 2010. A Probabilistic Model of Theory Formation.” Cognition.
Knaus, Lechner, and Strittmatter. 2021. Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence.” The Econometrics Journal.
Lipkovich, Dmitrienko, and B. 2017. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials.” Statistics in Medicine.
McElreath, and Boyd. 2007. Mathematical Models of Social Evolution: A Guide for the Perplexed.
Miller. 2013. The Chicago Guide to Writing about Multivariate Analysis. Chicago Guides to Writing, Editing, and Publishing.
Msaouel. 2022. The Big Data Paradox in Clinical Practice.” Cancer Investigation.
O’Connor, Bright, and Bruner. 2019. The Emergence of Intersectional Disadvantage.” Social Epistemology.
Su, Tsai, Wang, et al. 2009. Subgroup Analysis via Recursive Partitioning.” In Journal of Machine Learning Research.