Causal inference in highly parameterized ML

Applying a causal graph structure in the challenging environment of a no-holds-barred nonparametric machine learning algorithm such as a neural net or its ilk.


  • Léon Bottou, From Causal Graphs to Causal Invariance

    For many problems, it’s difficult to even attempt drawing a causal graph. While structural causal models provide a complete framework for causal inference, it is often hard to encode known physical laws (such as Newton’s gravitation, or the ideal gas law) as causal graphs. In familiar machine learning territory, how does one model the causal relationships between individual pixels and a target prediction? This is one of the motivating questions behind the paper Invariant Risk Minimization (IRM). In place of structured graphs, the authors elevate invariance to the defining feature of causality.

  • Nisha Muktewar and Chris Wallace, Causality for Machine Learning is the book Bottou recommends on this theme.

  • Why machine learning struggles with causality

Künzel et al. (2019) (HT Mike McKenna) looks interesting:

… We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the conditional average treatment effect (CATE) function. Metaalgorithms build on base algorithms—such as random forests (RFs), Bayesian additive regression trees (BARTs), or neural networks—to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a metaalgorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz-continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our X-learner can be used to target treatment regimes and to shed light on underlying mechanisms.


Arjovsky, Martin, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2020. “Invariant Risk Minimization.” March 27, 2020.
Besserve, Michel, Arash Mehrjou, Rémy Sun, and Bernhard Schölkopf. 2019. “Counterfactuals Uncover the Modular Structure of Deep Generative Models.” In.
Friedrich, Sarah, Gerd Antes, Sigrid Behr, Harald Binder, Werner Brannath, Florian Dumpert, Katja Ickstadt, et al. 2020. “Is There a Role for Statistics in Artificial Intelligence?” September 13, 2020.
Goyal, Anirudh, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf. 2020. “Recurrent Independent Mechanisms.” November 17, 2020.
Johnson, Matthew J, David K Duvenaud, Alex Wiltschko, Ryan P Adams, and Sandeep R Datta. 2016. “Composing Graphical Models with Neural Networks for Structured Representations and Fast Inference.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2946–54. Curran Associates, Inc.
Kocaoglu, Murat, Christopher Snyder, Alexandros G. Dimakis, and Sriram Vishwanath. 2017. CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training.” September 14, 2017.
Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116 (10): 4156–65.
Lattimore, Finnian Rachel. 2017. “Learning How to Act: Making Good Decisions with Machine Learning.”
Leeb, Felix, Guilia Lanzillotta, Yashas Annadani, Michel Besserve, Stefan Bauer, and Bernhard Schölkopf. 2021. “Structure by Architecture: Disentangled Representations Without Regularization.” July 4, 2021.
Locatello, Francesco, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. 2019. “Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations.” June 18, 2019.
Louizos, Christos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. “Causal Effect Inference with Deep Latent-Variable Models.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 6446–56. Curran Associates, Inc.
Lu, Chaochao, Yuhuai Wu, Jośe Miguel Hernández-Lobato, and Bernhard Schölkopf. 2021. “Nonlinear Invariant Risk Minimization: A Causal Approach.” June 9, 2021.
Ng, Ignavier, Zhuangyan Fang, Shengyu Zhu, Zhitang Chen, and Jun Wang. 2020. “Masked Gradient-Based Causal Structure Learning.” February 17, 2020.
Ng, Ignavier, Shengyu Zhu, Zhitang Chen, and Zhuangyan Fang. 2019. “A Graph Autoencoder Approach to Causal Structure Learning.” In Advances In Neural Information Processing Systems.
Rakesh, Vineeth, Ruocheng Guo, Raha Moraffah, Nitin Agarwal, and Huan Liu. 2018. “Linked Causal Variational Autoencoder for Inferring Paired Spillover Effects.” In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 1679–82. CIKM ’18. New York, NY, USA: Association for Computing Machinery.
Rotnitzky, Andrea, and Ezequiel Smucler. 2020. “Efficient Adjustment Sets for Population Average Causal Treatment Effect Estimation in Graphical Models.” Journal of Machine Learning Research 21 (188): 1–86.
Scholkopf, Bernhard, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. 2021. “Toward Causal Representation Learning.” Proceedings of the IEEE 109 (5): 612–34.
Yang, Mengyue, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, and Jun Wang. 2020. CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models.” July 1, 2020.
Zhang, Kun, Mingming Gong, Petar Stojanov, Biwei Huang, Qingsong Liu, and Clark Glymour. 2020. “Domain Adaptation as a Problem of Inference on Graphical Models.” In Advances in Neural Information Processing Systems. Vol. 33.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.