Interaction effects and subgroups are probably what we want to estimate

⚠️ TODO: I play fast and loose with language about subgroups and interaction terms here. We can define each in terms of the other often, but they are not quite the same thing. Maybe this would benefit from me making that clearer.

Estimating interaction effects is hard, but also it is probably the important thing to do in any complex and/or human system. So how do we optimally trade-off answering the most specific questions with te rapidly growing expense and difficulty of experiments large enough to detect them? Also the rapidly growing number of possible interactions as problems grow.

Connection with problematic methodology, when the need for specificity manifests through researcher degrees of freedom, i.e. choosing which interactions to model post hoc.

That is, the world is probably built of hierarchical models but we do not always have the right data to identify them, or enough of it when we do.

Lots of ill-connected notes ATM.

Review of limits of heterogeneous treatment effects literature

Data requirements, false discovery. If we want to learn interaction effects from observational studies then we need heroic amounts of data, to eliminate confounders and estimate the explosion of possible terms. Does this mean that by attempting to operate this way we are implicitly demanding a surveillance state?

Subgroup identification

Classic experimental practice tries to estimate an effect, then either

  1. faces a thicket of onerous multiple testing challenges to do model selection to work out who it applies to, or
  2. applies for new funding to identify relevant subgroups with new data in a new experiment.

Can we estimate subgroups and effects simultaneously? How bad is our degrees-of-freedom situation in this case? Not clear, and I could not see an easy answer skimming the references (Foster, Taylor, and Ruberg 2011; Imai and Ratkovic 2013; Lipkovich, Dmitrienko, and B 2017; Su et al. 2009).

Conditional average treatment effect

Working out how to condition on stuff is the bread and butter of causal inference, and there are a bunch of ways to analyse it there.

As transferability

If we know what interacts our model has then we are closer to learning the correct conditioning set. See external validity.


Kernel tricks for detecting 2 way interactions: Agrawal et al. (2019); Agrawal and Broderick (2021) See Tamara Broderick present this.

Ontological context

Scientific context

Over at social psychology, I’ve wondered about Peter Dorman’s comment:

the fixation on finding average effects when the structure of effect differences is what we ought to be interested in.

See Slime Mold Time Mold, Reality is Very Weird and You Need to be Prepared for That

But as we see from the history of scurvy, sometimes splitting is the right answer! In fact, there were meaningful differences in different kinds of citrus, and meaningful differences in different animals. Making a splitting argument to save a theory — “maybe our supplier switched to a different kind of citrus, we should check that out” — is a reasonable thing to do, especially if the theory was relatively successful up to that point.

Splitting is perfectly fair game, at least to an extent — doing it a few times is just prudent, though if you have gone down a dozen rabbitholes with no luck, then maybe it is time to start digging elsewhere.

Much commentary from Andrew Gelman et al on this theme. e.g. You need 16 times the sample size to estimate an interaction than to estimate a main effect (Gelman, Hill, and Vehtari 2021 ch 16.4).

C&C Epstein Barr and the Cause of Cause

Miller (2013) writes about basic data hygiene in this light for data journalists etc.

Social context

Is this what intersectionality means?

A real question. If we are concerned with the inequality, then there is an implied graphical model which produces as outputs different outcomes based on who is being modeled, and these will have implications with regard to fairness.

It turns out people have engaged meaningfully in this. Bright, Malinsky, and Thompson (2016) suggests some testable models

The first claim from within intersectional theory we explicate is what we call ‘nonadditive intersectionality’, which is the claim that somebody’s intersectional identity can influence his or her life more than one would realize if one considered each of the identity categories separately ðWeldon 2006; Hancock 2007; Bowleg 2008Þ. We interpret this as meaning that some causal effects of belonging to multiple identity categories are stronger than one might have predicted from information about the causal effect of belonging to each identity category considered separately. Take, for instance, the claim made here: “In some cases the negative effects of racism and sexism might multiply each other, rendering women of color most disadvantaged on a dependent variable ðe.g., incomeÞ” ðCole 2009, 177Þ. There is already a causal effect of being a woman on one’s income, and likewise there is already a causal effect of being a person of color. The intersectional phenomenon Cole reports is that occupying the intersectional identity of being a woman of color serves to amplify these causal processes.

The second claim to be explicated is what we call ‘switch intersectionality’. Such claims describe causal relationships that are ðdeÞactivated only for individuals who occupy the intersection of certain identity positions. Consider, for instance, the following point from Dotson ð2014, 52Þ: “½There exists aŠ tendency to theoretically erase the experiences of oppression that are invoked as a result of being black women and not merely being black or a woman.” We believe it is consistent with the author’s intentions to say that combating this tendency involves acknowledging that the fact that a person is a black woman, rather than black or a woman considered singularly, causes her to undergo certain experiences. We will provide an analysis of switch intersectionality, which is to say causal processes that are activated only when the individuals under study occupy particular intersections of demographic categories.

Reverse advice as necessary

Every pundit has a model for what the typical member of the public thinks, and directs their advice accordingly. For many reasons, the pundit’s model is likely to be wrong. The readers of various pundits are a self-selecting sample, and the pundit’s intuitive model of society is distorted and even if they surveyed their readership, it is hard to use that to know anything truly about the readership.

So all advice like “People should do more X” is suspect, because the advice is based on the author’s assumption that the readers are in class A but they in fact could easily be in class B, who maybe should do less X, possibly because X does not work for class B people in general, or because class B people are generally likley to have done too much X and maybe need to lay off the X for a while.

This is a problem for general advice. The living world, and especially the social world, is made of adaptive systems, which are characterised by complicated control problems where the answer is often not “always more X” but rather “Hit the sweet spot of the perfect amount of X, not too little, not too much” where the “perfect amount” is a function of the context.


Agrawal, Raj, and Tamara Broderick. 2021. The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time.” arXiv:2106.12408 [Stat], October.
Agrawal, Raj, Brian Trippe, Jonathan Huggins, and Tamara Broderick. 2019. The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions.” In Proceedings of the 36th International Conference on Machine Learning, 141–50. PMLR.
Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. Generalized Random Forests.” Annals of Statistics 47 (2): 1148–78.
Bright, Liam Kofi, Daniel Malinsky, and Morgan Thompson. 2016. Causally Interpreting Intersectionality Theory.” Philosophy of Science 83 (1): 60–81.
DiTraglia, Francis J., Camilo Garcia-Jimeno, Rossa O’Keeffe-O’Donovan, and Alejandro Sanchez-Becerra. 2020. Identifying Causal Effects in Experiments with Social Interactions and Non-Compliance.” arXiv:2011.07051 [Econ, Stat], November.
Foster, Jared C., Jeremy M.G. Taylor, and Stephen J. Ruberg. 2011. Subgroup Identification from Randomized Clinical Trial Data.” Statistics in Medicine 30 (24): 10.1002/sim.4322.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2021. Regression and other stories. Cambridge, UK: Cambridge University Press.
Gigerenzer, Gerd. n.d. We Need to Think More about How We Conduct Research.” Behavioral and Brain Sciences 45.
Imai, Kosuke, and Marc Ratkovic. 2013. Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation.” The Annals of Applied Statistics 7 (1): 443–70.
Knaus, Michael C., Michael Lechner, and Anthony Strittmatter. 2021. Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence.” The Econometrics Journal 24 (1): 134–61.
Lipkovich, Ilya, Alex Dmitrienko, and Ralph B. 2017. Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials.” Statistics in Medicine 36 (1): 136–96.
McElreath, Richard, and Robert Boyd. 2007. Mathematical Models of Social Evolution: A Guide for the Perplexed. University Of Chicago Press.
Miller, Jane E. 2013. The Chicago Guide to Writing about Multivariate Analysis. Second edition. Chicago Guides to Writing, Editing, and Publishing. Chicago: University of Chicago Press.
O’Connor, Cailin, Liam Kofi Bright, and Justin P. Bruner. 2019. The Emergence of Intersectional Disadvantage.” Social Epistemology 33 (1): 23–41.
Su, Xiaogang, Chih-Ling Tsai, Hansheng Wang, David M. Nickerson, and Bogong Li. 2009. Subgroup Analysis via Recursive Partitioning.” In Journal of Machine Learning Research. Vol. 10. Rochester, NY.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.