Causality via potential outcomes

Neyman-Rubin, counterfactuals, conditional treatment effects, and related tricks

October 26, 2016 — December 10, 2021

algebra
graphical models
how do science
machine learning
networks
probability
statistics

A sister(?) field of the DAG-centric causal inference. I say sister field because the patriarch of the DAG school, Judea Pearl, seems to regard potential outcomes as a special case of causal DAG reasoning, claiming a proud lineage going back to Seward Wright. OTOH, proponents of potential outcomes, especially Rubin, seem to regard it as the actually-practical way to do causal reasoning and claim a proud lineage going to Jerzy Neyman. In practice I suspect we as users do not need to worry excessively about the border disputes.

Rubin and Waterman (2006) comes recommended by Shalizi as:

A good description of Rubin et al.’s methods for causal inference, adapted to the meanest understanding. […] Rubin and Waterman do a very good job of explaining, in a clear and concrete problem, just how and why the newer techniques of causal inference are valuable, with just enough technical detail that it doesn’t seem like magic.

Figure 1

1 Relationship to Pearl-style do-calculus

Uri Shalit argues

Rubin and Pearl are kind of “academic enemies”. Though neither completely dismisses the other, they both make snide remarks about the other’s work. Pearl shows in his book exactly how Neyman-Rubin potential outcomes can be derived from causal graphs. As far as I know Rubin never really makes an attempt to address Pearl’s ideas directly. However, Rubin, being a statistician, made significant contributions to the practice of real-world causal inference, which go beyond Pearl’s interests. Jamie Robins also made seminal contributions to this subject. You can read some of the debate on Andrew Gelman’s blog here. Pearl writes in the comment section and in that blog post there are links to follow up posts.

I am more familiar with the Pearl-style approach. The two connect by, e.g. Single World Intervention Graphs (Richardson and Robins 2013).

2 Heterogeneous treatment effects

See interaction effects for now.

3 Instrumental variables

see instrumental variables.

4 External validity

Dataset shift etc. See external validity.

5 Use in ML

See causality in ML.

6 Propensity matching

TODO

7 Causal forests

To follow up: proximity matrix in causal random forest.

The GRF Algorithm. Haaya Naushan: Causal Machine Learning for Econometrics: Causal Forests.

8 Double learning

See causality and ML.

9 References

Athey, and Wager. 2019. Estimating Treatment Effects with Causal Forests: An Application.” arXiv:1902.07409 [Stat].
Bareinboim, and Pearl. 2016. Causal Inference and the Data-Fusion Problem.” Proceedings of the National Academy of Sciences.
Bloniarz, Liu, Zhang, et al. 2015. Lasso Adjustments of Treatment Effect Estimates in Randomized Experiments.” arXiv:1507.03652 [Math, Stat].
Brodersen, Gallusser, Koehler, et al. 2015. Inferring Causal Impact Using Bayesian Structural Time-Series Models.” The Annals of Applied Statistics.
Bühlmann. 2020. Invariance, Causality and Robustness.” Statistical Science.
Chau, Ton, González, et al. 2021. BayesIMP: Uncertainty Quantification for Causal Data Fusion.”
Chernozhukov, Chetverikov, Demirer, et al. 2018. Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal.
Chernozhukov, Hansen, and Spindler. 2015. Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach.” Annual Review of Economics.
Dahlhaus, and Eichler. 2003. Causality and Graphical Models in Time Series Analysis.” Oxford Statistical Science Series.
Dawid. 2021. Decision-Theoretic Foundations for Statistical Causality.” Journal of Causal Inference.
De Luna, Waernbaum, and Richardson. 2011. Covariate Selection for the Nonparametric Estimation of an Average Treatment Effect.” Biometrika.
Gelman. 2010. Causality and Statistical Learning.” American Journal of Sociology.
Gelman, and Meng. 2004. Applied Bayesian Modeling and Causal Inference From Incomplete-Data Perspectives.
Gelman, and Shalizi. 2013. Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology.
Greenland, and Robins. 2009. Identifiability, Exchangeability and Confounding Revisited.” Epidemiologic Perspectives & Innovations : EP+I.
Heinze-Deml, Maathuis, and Meinshausen. 2018. Causal Structure Learning.” Annual Review of Statistics and Its Application.
Imbens, Guido W. 2014. Instrumental Variables: An Econometrician’s Perspective.” Statistical Science.
Imbens, Guido, and Menzel. 2021. A Causal Bootstrap.” The Annals of Statistics.
Kennedy, Mauro, Daniels, et al. 2019. Handling Missing Data in Instrumental Variable Methods for Causal Inference.” Annual Review of Statistics and Its Application.
Kohler, Kreuter, and Stuart. 2019. Nonprobability Sampling and Causal Analysis.” Annual Review of Statistics and Its Application.
Kuang, Sala, Sohoni, et al. 2020. Ivy: Instrumental Variable Synthesis for Causal Inference.” In International Conference on Artificial Intelligence and Statistics.
Künzel, Sekhon, Bickel, et al. 2019. Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences.
Lattimore. 2017. Learning How to Act: Making Good Decisions with Machine Learning.”
Malinsky, Shpitser, and Richardson. 2019. A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects.” arXiv:1903.03662 [Stat].
Manski. 2011. Choosing Treatment Policies Under Ambiguity.” Annual Review of Economics.
Meinshausen. 2018. Causality from a Distributional Robustness Point of View.” In 2018 IEEE Data Science Workshop (DSW).
Mishler, and Kennedy. 2021. FADE: FAir Double Ensemble Learning for Observable and Counterfactual Outcomes.” arXiv:2109.00173 [Cs, Stat].
Morgan, and Winship. 2015. Counterfactuals and Causal Inference.
Pearl. 2009. Causal Inference in Statistics: An Overview.” Statistics Surveys.
Pearl, and Bareinboim. 2014. External Validity: From Do-Calculus to Transportability Across Populations.” Statistical Science.
Richardson, and Robins. 2013. Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality.”
Rothenhäusler, Meinshausen, Bühlmann, et al. 2020. Anchor Regression: Heterogeneous Data Meets Causality.” arXiv:1801.06229 [Stat].
Rubin, and Waterman. 2006. Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology.” Statistical Science.
Schulam, and Saria. 2017. Reliable Decision Support Using Counterfactual Models.” In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17.
Shalit, Johansson, and Sontag. 2017. Estimating Individual Treatment Effect: Generalization Bounds and Algorithms.” arXiv:1606.03976 [Cs, Stat].
Shalizi. n.d. “Advanced Data Analysis from an Elementary Point of View.”
Sharma, Hofman, and Watts. 2015. Estimating the Causal Impact of Recommendation Systems from Observational Data.” Proceedings of the Sixteenth ACM Conference on Economics and Computation - EC ’15.
Shpitser, Mohan, and Pearl. 2015. Missing Data as a Causal and Probabilistic Problem.”
Shpitser, and Tchetgen. 2014. Causal Inference with a Graphical Hierarchy of Interventions.” arXiv:1411.2127 [Stat].
van der Zander, Textor, and Liskiewicz. 2015. Efficiently Finding Conditional Instruments for Causal Inference.” In Proceedings of the 24th International Conference on Artificial Intelligence. IJCAI’15.
Vansteelandt, Bekaert, and Claeskens. 2012. On Model Selection and Model Misspecification in Causal Inference.” Statistical Methods in Medical Research.
Yadav, Prunelli, Hoff, et al. 2016. Causal Inference in Observational Data.” arXiv:1611.04660 [Cs, Stat].
Zhang, Imaizumi, Schölkopf, et al. 2021. Maximum Moment Restriction for Instrumental Variable Regression.” arXiv:2010.07684 [Cs].