Doubly robust learning for causal inference

TMLE, debiassed ML, X-learners, Neyman learning

September 18, 2020 — March 18, 2024

graphical models
hidden variables
hierarchical models
how do science
machine learning
neural nets
Figure 1: Double learning for effect estimation.

An area of causal learning and in particular ML-style causal learning, about which I should learn. It looks a lot like instrumental variables regression, except that the latter is usually presented in a strictly linear context.

In the mean time, there are many papers I have been recommended. Probably I should start from recent reviews such as Guo et al. (2020), Kennedy (2023), Funk et al. (2011) or Chernozhukov et al. (2017).

I was introduced to this area in the form of Künzel et al. (2019) by Mike McKenna. That paper introduces a generic intervention estimator for ML methods.

We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the conditional average treatment effect (CATE) function. Metaalgorithms build on base algorithms—such as random forests (RFs), Bayesian additive regression trees (BARTs), or neural networks—to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a metaalgorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz-continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our X-learner can be used to target treatment regimes and to shed light on underlying mechanisms.

See also Mishler and Kennedy (2021). Maybe related: Shalit, Johansson, and Sontag (2017), Shi, Blei, and Veitch (2019).

1 Tooling

1.1 EconML

  • py-why/EconML

    ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) \(t\) on an outcome variable \(y\), controlling for a set of features \(x\).

  • EconML - Microsoft Research

2 References

Arjovsky, Bottou, Gulrajani, et al. 2020. Invariant Risk Minimization.”
Athey, and Imbens. 2019. Machine Learning Methods That Economists Should Know About.” Annual Review of Economics.
Athey, Tibshirani, and Wager. 2019. Generalized Random Forests.” Annals of Statistics.
Athey, and Wager. 2021. Policy Learning With Observational Data.” Econometrica.
Chernozhukov, Chetverikov, Demirer, et al. 2017. Double/Debiased/Neyman Machine Learning of Treatment Effects.” American Economic Review.
Chernozhukov, Chetverikov, Demirer, et al. 2018. Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal.
Chernozhukov, Escanciano, Ichimura, et al. 2022. Locally Robust Semiparametric Estimation.” Econometrica.
Dudík, Erhan, Langford, et al. 2014. Doubly Robust Policy Evaluation and Optimization.” Statistical Science.
Foster, and Syrgkanis. 2023. Orthogonal Statistical Learning.” The Annals of Statistics.
Funk, Westreich, Wiesen, et al. 2011. Doubly Robust Estimation of Causal Effects.” American Journal of Epidemiology.
Guo, Cheng, Li, et al. 2020. A Survey of Learning Causality with Data: Problems and Methods.” ACM Computing Surveys.
Hartford, Lewis, Leyton-Brown, et al. 2017. Deep IV: A Flexible Approach for Counterfactual Prediction.” In Proceedings of the 34th International Conference on Machine Learning.
Hines, Dukes, Diaz-Ordaz, et al. 2022. Demystifying Statistical Learning Based on Efficient Influence Functions.” The American Statistician.
Jordan, Wang, and Zhou. 2022. Empirical Gateaux Derivatives for Causal Inference.”
Kennedy. 2023. Semiparametric Doubly Robust Targeted Double Machine Learning: A Review.”
Kennedy, Ma, McHugh, et al. 2017. Non-Parametric Methods for Doubly Robust Estimation of Continuous Treatment Effects.” Journal of the Royal Statistical Society Series B: Statistical Methodology.
Künzel, Sekhon, Bickel, et al. 2019. Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences.
Louizos, Shalit, Mooij, et al. 2017. Causal Effect Inference with Deep Latent-Variable Models.” In Advances in Neural Information Processing Systems 30.
Melnychuk, Frauen, and Feuerriegel. 2022. Causal Transformer for Estimating Counterfactual Outcomes.” In Proceedings of the 39th International Conference on Machine Learning.
Mishler, and Kennedy. 2021. FADE: FAir Double Ensemble Learning for Observable and Counterfactual Outcomes.” arXiv:2109.00173 [Cs, Stat].
Nekipelov, Semenova, and Syrgkanis. 2021. Regularized Orthogonal Machine Learning for Nonlinear Semiparametric Models.”
Nie, and Wager. 2021. Quasi-Oracle Estimation of Heterogeneous Treatment Effects.” Biometrika.
Oprescu, Syrgkanis, and Wu. 2019. Orthogonal Random Forest for Causal Inference.” In Proceedings of the 36th International Conference on Machine Learning.
Prosperi, Guo, Sperrin, et al. 2020. Causal Inference and Counterfactual Prediction in Machine Learning for Actionable Healthcare.” Nature Machine Intelligence.
Schuler, and Rose. 2017. Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies.” American Journal of Epidemiology.
Shalit, Johansson, and Sontag. 2017. Estimating Individual Treatment Effect: Generalization Bounds and Algorithms.” arXiv:1606.03976 [Cs, Stat].
Shi, Blei, and Veitch. 2019. Adapting Neural Networks for the Estimation of Treatment Effects.” In Proceedings of the 33rd International Conference on Neural Information Processing Systems.
Syrgkanis, Lei, Oprescu, et al. 2019. Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments.”
van der Laan, Polley, and Hubbard. 2007. Super Learner.” Statistical Applications in Genetics and Molecular Biology.
van der Laan, and Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics.
van der Laan, and Rubin. 2006. Targeted Maximum Likelihood Learning.” The International Journal of Biostatistics.