Doubly robust learning for causal inference

TMLE, debiassed ML, X-learners, Neyman learning, Targeted learning

September 18, 2020 — May 27, 2024

algebra
graphical models
hidden variables
hierarchical models
how do science
machine learning
networks
neural nets
probability
statistics
Figure 1: Double learning for effect estimation.

An area of causal learning and in particular ML-style causal learning, about which I should learn. It looks a lot like instrumental variables regression, except that the latter is usually presented in a strictly linear context.

I was introduced to this area by Künzel et al. (2019) (thanks to Mike McKenna). That paper introduces a generic intervention estimator for ML methods.

We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the conditional average treatment effect (CATE) function. Metaalgorithms build on base algorithms—such as random forests (RFs), Bayesian additive regression trees (BARTs), or neural networks—to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a metaalgorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz-continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favourably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our X-learner can be used to target treatment regimes and to shed light on underlying mechanisms.

Since then, many papers have been recommended to me. Probably I should start from recent reviews such as Guo et al. (2020), Kennedy (2023), Funk et al. (2011) or Chernozhukov et al. (2017).

See also Mishler and Kennedy (2021). Maybe related: Shalit, Johansson, and Sontag (2017), Shi, Blei, and Veitch (2019).

1 Tooling

1.1 “Generalized” random forests

generalized random forests (Athey, Tibshirani, and Wager 2019) (implementation) describe themselves:

GRF extends the idea of a classic random forest to allow for estimating other statistical quantities besides the expected outcome. Each forest type, for example quantile_forest, trains a random forest targeted at a particular problem, like quantile estimation. The most common use of GRF is in estimating treatment effects through the function causal_forest.

1.2 EconML

  • py-why/EconML

    ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) \(t\) on an outcome variable \(y\), controlling for a set of features \(x\).

  • EconML - Microsoft Research

2 References

Arjovsky, Bottou, Gulrajani, et al. 2020. Invariant Risk Minimization.”
Athey. 2017. Beyond Prediction: Using Big Data for Policy Problems.” Science.
Athey, and Imbens. 2019. Machine Learning Methods That Economists Should Know About.” Annual Review of Economics.
Athey, Tibshirani, and Wager. 2019. Generalized Random Forests.” Annals of Statistics.
Athey, and Wager. 2019. Estimating Treatment Effects with Causal Forests: An Application.” arXiv:1902.07409 [Stat].
———. 2021. Policy Learning With Observational Data.” Econometrica.
Chernozhukov, Chetverikov, Demirer, et al. 2017. Double/Debiased/Neyman Machine Learning of Treatment Effects.” American Economic Review.
Chernozhukov, Chetverikov, Demirer, et al. 2018. Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal.
Chernozhukov, Escanciano, Ichimura, et al. 2022. Locally Robust Semiparametric Estimation.” Econometrica.
Dudík, Erhan, Langford, et al. 2014. Doubly Robust Policy Evaluation and Optimization.” Statistical Science.
Foster, and Syrgkanis. 2023. Orthogonal Statistical Learning.” The Annals of Statistics.
Funk, Westreich, Wiesen, et al. 2011. Doubly Robust Estimation of Causal Effects.” American Journal of Epidemiology.
Guo, Cheng, Li, et al. 2020. A Survey of Learning Causality with Data: Problems and Methods.” ACM Computing Surveys.
Hartford, Lewis, Leyton-Brown, et al. 2017. Deep IV: A Flexible Approach for Counterfactual Prediction.” In Proceedings of the 34th International Conference on Machine Learning.
Hines, Dukes, Diaz-Ordaz, et al. 2022. Demystifying Statistical Learning Based on Efficient Influence Functions.” The American Statistician.
Jordan, Wang, and Zhou. 2022. Empirical Gateaux Derivatives for Causal Inference.”
Kennedy. 2023. Semiparametric Doubly Robust Targeted Double Machine Learning: A Review.”
Kennedy, Ma, McHugh, et al. 2017. Non-Parametric Methods for Doubly Robust Estimation of Continuous Treatment Effects.” Journal of the Royal Statistical Society Series B: Statistical Methodology.
Künzel, Sekhon, Bickel, et al. 2019. Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences.
Louizos, Shalit, Mooij, et al. 2017. Causal Effect Inference with Deep Latent-Variable Models.” In Advances in Neural Information Processing Systems 30.
Melnychuk, Frauen, and Feuerriegel. 2022. Causal Transformer for Estimating Counterfactual Outcomes.” In Proceedings of the 39th International Conference on Machine Learning.
Mishler, and Kennedy. 2021. FADE: FAir Double Ensemble Learning for Observable and Counterfactual Outcomes.” arXiv:2109.00173 [Cs, Stat].
Nekipelov, Semenova, and Syrgkanis. 2021. Regularized Orthogonal Machine Learning for Nonlinear Semiparametric Models.”
Nie, and Wager. 2021. Quasi-Oracle Estimation of Heterogeneous Treatment Effects.” Biometrika.
Oprescu, Syrgkanis, and Wu. 2019. Orthogonal Random Forest for Causal Inference.” In Proceedings of the 36th International Conference on Machine Learning.
Prosperi, Guo, Sperrin, et al. 2020. Causal Inference and Counterfactual Prediction in Machine Learning for Actionable Healthcare.” Nature Machine Intelligence.
Schuler, and Rose. 2017. Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies.” American Journal of Epidemiology.
Shalit, Johansson, and Sontag. 2017. Estimating Individual Treatment Effect: Generalization Bounds and Algorithms.” arXiv:1606.03976 [Cs, Stat].
Shi, Blei, and Veitch. 2019. Adapting Neural Networks for the Estimation of Treatment Effects.” In Proceedings of the 33rd International Conference on Neural Information Processing Systems.
Syrgkanis, Lei, Oprescu, et al. 2019. Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments.”
van der Laan, Polley, and Hubbard. 2007. Super Learner.” Statistical Applications in Genetics and Molecular Biology.
van der Laan, and Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics.
van der Laan, and Rubin. 2006. Targeted Maximum Likelihood Learning.” The International Journal of Biostatistics.