Causal inference in highly parameterized ML

Applying a causal graph structure in the challenging environment of a no-holds-barred nonparametric machine learning algorithm such as a neural net or its ilk. I am interested in this because it seems necessary and kind of obvious for handling things like dataset shift, but is often ignored. What is that about?

I do not know at the moment. This is a link salad for now.

Invariance approaches

Léon Bottou, From Causal Graphs to Causal Invariance:

For many problems, it’s difficult to even attempt drawing a causal graph. While structural causal models provide a complete framework for causal inference, it is often hard to encode known physical laws (such as Newton’s gravitation, or the ideal gas law) as causal graphs. In familiar machine learning territory, how does one model the causal relationships between individual pixels and a target prediction? This is one of the motivating questions behind the paper Invariant Risk Minimization (IRM). In place of structured graphs, the authors elevate invariance to the defining feature of causality.

He commends the Cloudera Fast Forward tutorial Causality for Machine Learning, which is a nice bit of applied work.

Causality for feedback and continuous fields

There is a fun body of work by what is in my mind the Central European causality-ML think tank. There is some high connectivity between various interesting people: Bernhard Schölkopf, Jonas Peters, Joris Mooij, Stephan Bongers and Dominik Janzing etc. I would love to understand everything that is going on with their outputs, particularly as regards causality in feedback and control systems. Perhaps I should start with the book (Peters, Janzing, and Schölkopf 2017) (Free PDF), or the chatty casual introduction (Schölkopf 2022).

For a good explanation of what they are about by example, see Bernhard Schölkopf: Causality and Exoplanets.

I am particularly curious about their work in causality in continuous fields, e.g. Bongers et al. (2020);Bongers and Mooij (2018);Bongers et al. (2016);Rubenstein et al. (2018).

Double learning

Künzel et al. (2019) (HT Mike McKenna) looks interesting - it is a generic intervention estimator for ML methods (AFAICT this extends the double regression/instrumental variables approach.)

… We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the conditional average treatment effect (CATE) function. Metaalgorithms build on base algorithms—such as random forests (RFs), Bayesian additive regression trees (BARTs), or neural networks—to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a metaalgorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz-continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our X-learner can be used to target treatment regimes and to shed light on underlying mechanisms.

See also Mishler and Kennedy (2021). Maybe related: Shalit, Johansson, and Sontag (2017), Shi, Blei, and Veitch (2019).


Detecting causal associations in time series datasets is a key challenge for novel insights into complex dynamical systems such as the Earth system or the human brain. Interactions in such systems present a number of major challenges for causal discovery techniques and it is largely unknown which methods perform best for which challenge.

The CauseMe platform provides ground truth benchmark datasets featuring different real data challenges to assess and compare the performance of causal discovery methods. The available benchmark datasets are either generated from synthetic models mimicking real challenges, or are real world data sets where the causal structure is known with high confidence. The datasets vary in dimensionality, complexity and sophistication.


Nisha Muktewar and Chris Wallace, Causality for Machine Learning is the book Bottou recommends on this theme.

For coders, Ben Dickson writes on Why machine learning struggles with causality.

Cheng Soon Ong recommends Finn Lattimore to me as an important perspective.

biomedia-mira/deepscm: Repository for Deep Structural Causal Models for Tractable Counterfactual Inference (Pawlowski, Coelho de Castro, and Glocker 2020).


Arjovsky, Martin, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2020. Invariant Risk Minimization.” arXiv.
Athey, Susan, and Stefan Wager. 2019. Estimating Treatment Effects with Causal Forests: An Application.” arXiv:1902.07409 [Stat], February.
Bareinboim, Elias, Juan D. Correa, Duligur Ibeling, and Thomas Icard. 2022. On Pearl’s Hierarchy and the Foundations of Causal Inference.” In Probabilistic and Causal Inference: The Works of Judea Pearl, 1st ed., 36:507–56. New York, NY, USA: Association for Computing Machinery.
Besserve, Michel, Arash Mehrjou, Rémy Sun, and Bernhard Schölkopf. 2019. Counterfactuals Uncover the Modular Structure of Deep Generative Models.” In arXiv:1812.03253 [Cs, Stat].
Bishop, J. Mark. 2021. Artificial Intelligence Is Stupid and Causal Reasoning Will Not Fix It.” Frontiers in Psychology 11.
Bongers, Stephan, Patrick Forré, Jonas Peters, Bernhard Schölkopf, and Joris M. Mooij. 2020. Foundations of Structural Causal Models with Cycles and Latent Variables.” arXiv:1611.06221 [Cs, Stat], October.
Bongers, Stephan, and Joris M. Mooij. 2018. From Random Differential Equations to Structural Causal Models: The Stochastic Case.” arXiv:1803.08784 [Cs, Stat], March.
Bongers, Stephan, Jonas Peters, Bernhard Schölkopf, and Joris M. Mooij. 2016. Structural Causal Models: Cycles, Marginalizations, Exogenous Reparametrizations and Reductions.” arXiv:1611.06221 [Cs, Stat], November.
Christiansen, Rune, Niklas Pfister, Martin Emil Jakobsen, Nicola Gnecco, and Jonas Peters. 2020. A Causal Framework for Distribution Generalization,” June.
Fernández-Loría, Carlos, and Foster Provost. 2021. Causal Decision Making and Causal Effect Estimation Are Not the Same… and Why It Matters.” arXiv:2104.04103 [Cs, Stat], September.
Friedrich, Sarah, Gerd Antes, Sigrid Behr, Harald Binder, Werner Brannath, Florian Dumpert, Katja Ickstadt, et al. 2020. Is There a Role for Statistics in Artificial Intelligence? arXiv:2009.09070 [Cs], September.
Gendron, Gaël, Michael Witbrock, and Gillian Dobbie. 2023. A Survey of Methods, Challenges and Perspectives in Causality.” arXiv.
Goyal, Anirudh, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf. 2020. Recurrent Independent Mechanisms.” arXiv:1909.10893 [Cs, Stat], November.
Hartford, Jason, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. 2017. Deep IV: A Flexible Approach for Counterfactual Prediction.” In Proceedings of the 34th International Conference on Machine Learning, 1414–23. PMLR.
Huang, Yu, Zuntao Fu, and Christian L. E. Franzke. 2020. Detecting Causality from Time Series in a Machine Learning Framework.” Chaos: An Interdisciplinary Journal of Nonlinear Science 30 (6): 063116.
Johnson, Matthew J, David K Duvenaud, Alex Wiltschko, Ryan P Adams, and Sandeep R Datta. 2016. Composing Graphical Models with Neural Networks for Structured Representations and Fast Inference.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2946–54. Curran Associates, Inc.
Jordan, Michael I., Yixin Wang, and Angela Zhou. 2022. Empirical Gateaux Derivatives for Causal Inference.” arXiv.
Kaddour, Jean, Aengus Lynch, Qi Liu, Matt J. Kusner, and Ricardo Silva. 2022. Causal Machine Learning: A Survey and Open Problems.” arXiv.
Karimi, Amir-Hossein, Gilles Barthe, Bernhard Schölkopf, and Isabel Valera. 2021. A Survey of Algorithmic Recourse: Definitions, Formulations, Solutions, and Prospects.” arXiv.
Karimi, Amir-Hossein, Krikamol Muandet, Simon Kornblith, Bernhard Schölkopf, and Been Kim. 2022. On the Relationship Between Explanation and Prediction: A Causal View.” arXiv.
Kirk, Robert, Amy Zhang, Edward Grefenstette, and Tim Rocktäschel. 2023. A Survey of Zero-Shot Generalisation in Deep Reinforcement Learning.” Journal of Artificial Intelligence Research 76 (January): 201–64.
Kocaoglu, Murat, Christopher Snyder, Alexandros G. Dimakis, and Sriram Vishwanath. 2017. CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training.” arXiv:1709.02023 [Cs, Math, Stat], September.
Kosoy, Eliza, David M. Chan, Adrian Liu, Jasmine Collins, Bryanna Kaufmann, Sandy Han Huang, Jessica B. Hamrick, John Canny, Nan Rosemary Ke, and Alison Gopnik. 2022. Towards Understanding How Machines Can Learn Causal Overhypotheses.” arXiv.
Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116 (10): 4156–65.
Lattimore, Finnian Rachel. 2017. Learning How to Act: Making Good Decisions with Machine Learning.”
Leeb, Felix, Guilia Lanzillotta, Yashas Annadani, Michel Besserve, Stefan Bauer, and Bernhard Schölkopf. 2021. Structure by Architecture: Disentangled Representations Without Regularization.” arXiv:2006.07796 [Cs, Stat], July.
Li, Lu, Yongjiu Dai, Wei Shangguan, Zhongwang Wei, Nan Wei, and Qingliang Li. 2022. Causality-Structured Deep Learning for Soil Moisture Predictions.” Journal of Hydrometeorology 23 (8): 1315–31.
Locatello, Francesco, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. 2019. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations.” arXiv:1811.12359 [Cs, Stat], June.
Locatello, Francesco, Ben Poole, Gunnar Raetsch, Bernhard Schölkopf, Olivier Bachem, and Michael Tschannen. 2020. Weakly-Supervised Disentanglement Without Compromises.” In Proceedings of the 37th International Conference on Machine Learning, 119:6348–59. PMLR.
Louizos, Christos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. Causal Effect Inference with Deep Latent-Variable Models.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 6446–56. Curran Associates, Inc.
Lu, Chaochao, Yuhuai Wu, Jośe Miguel Hernández-Lobato, and Bernhard Schölkopf. 2021. Nonlinear Invariant Risk Minimization: A Causal Approach.” arXiv:2102.12353 [Cs, Stat], June.
Mehta, Raghav, Vítor Albiero, Li Chen, Ivan Evtimov, Tamar Glaser, Zhiheng Li, and Tal Hassner. 2022. You Only Need a Good Embeddings Extractor to Fix Spurious Correlations.” arXiv.
Melnychuk, Valentyn, Dennis Frauen, and Stefan Feuerriegel. 2022. Causal Transformer for Estimating Counterfactual Outcomes.” arXiv.
Mishler, Alan, and Edward Kennedy. 2021. FADE: FAir Double Ensemble Learning for Observable and Counterfactual Outcomes.” arXiv:2109.00173 [Cs, Stat], August.
Mooij, Joris M., Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks.” Journal of Machine Learning Research 17 (32): 1–102.
Ng, Ignavier, Zhuangyan Fang, Shengyu Zhu, Zhitang Chen, and Jun Wang. 2020. Masked Gradient-Based Causal Structure Learning.” arXiv:1910.08527 [Cs, Stat], February.
Ng, Ignavier, Shengyu Zhu, Zhitang Chen, and Zhuangyan Fang. 2019. A Graph Autoencoder Approach to Causal Structure Learning.” In Advances In Neural Information Processing Systems.
Ortega, Pedro A., Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, et al. 2021. Shaking the Foundations: Delusions in Sequence Models for Interaction and Control.” arXiv:2110.10819 [Cs], October.
Pawlowski, Nick, Daniel Coelho de Castro, and Ben Glocker. 2020. Deep Structural Causal Models for Tractable Counterfactual Inference.” In Advances in Neural Information Processing Systems, 33:857–69. Curran Associates, Inc.
Peters, Jonas, Peter Bühlmann, and Nicolai Meinshausen. 2016. Causal Inference by Using Invariant Prediction: Identification and Confidence Intervals.” Journal of the Royal Statistical Society Series B: Statistical Methodology 78 (5): 947–1012.
Peters, Jonas, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. Adaptive Computation and Machine Learning Series. Cambridge, Massachuestts: The MIT Press.
Poulos, Jason, and Shuxi Zeng. 2021. RNN-Based Counterfactual Prediction, with an Application to Homestead Policy and Public Schooling.” Journal of the Royal Statistical Society Series C: Applied Statistics 70 (4): 1124–39.
Rakesh, Vineeth, Ruocheng Guo, Raha Moraffah, Nitin Agarwal, and Huan Liu. 2018. Linked Causal Variational Autoencoder for Inferring Paired Spillover Effects.” In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 1679–82. CIKM ’18. New York, NY, USA: Association for Computing Machinery.
Richardson, Thomas S., and James M. Robins. 2013. Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality.” Citeseer.
Roscher, Ribana, Bastian Bohn, Marco F. Duarte, and Jochen Garcke. 2020. Explainable Machine Learning for Scientific Insights and Discoveries.” IEEE Access 8: 42200–42216.
Rotnitzky, Andrea, and Ezequiel Smucler. 2020. Efficient Adjustment Sets for Population Average Causal Treatment Effect Estimation in Graphical Models.” Journal of Machine Learning Research 21 (188): 1–86.
Rubenstein, Paul K., Stephan Bongers, Bernhard Schölkopf, and Joris M. Mooij. 2018. From Deterministic ODEs to Dynamic Structural Causal Models.” In Uncertainty in Artificial Intelligence.
Runge, Jakob, Sebastian Bathiany, Erik Bollt, Gustau Camps-Valls, Dim Coumou, Ethan Deyle, Clark Glymour, et al. 2019. Inferring Causation from Time Series in Earth System Sciences.” Nature Communications 10 (1): 2553.
Schölkopf, Bernhard. 2022. Causality for Machine Learning.” In Probabilistic and Causal Inference: The Works of Judea Pearl, 1st ed., 36:765–804. New York, NY, USA: Association for Computing Machinery.
Schölkopf, Bernhard, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. 2021. Toward Causal Representation Learning.” Proceedings of the IEEE 109 (5): 612–34.
Shalit, Uri, Fredrik D. Johansson, and David Sontag. 2017. Estimating Individual Treatment Effect: Generalization Bounds and Algorithms.” arXiv:1606.03976 [Cs, Stat], May.
Shi, Claudia, David M. Blei, and Victor Veitch. 2019. Adapting Neural Networks for the Estimation of Treatment Effects.” In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2507–17. Red Hook, NY, USA: Curran Associates Inc.
Tigas, Panagiotis, Yashas Annadani, Andrew Jesson, Bernhard Schölkopf, Yarin Gal, and Stefan Bauer. 2022. Interventions, Where and How? Experimental Design for Causal Models at Scale.” Advances in Neural Information Processing Systems 35 (December): 24130–43.
Veitch, Victor, and Anisha Zaveri. 2020. Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding,” March.
Vowels, Matthew J., Necati Cihan Camgoz, and Richard Bowden. 2022. D’ya Like DAGs? A Survey on Structure Learning and Causal Discovery.” ACM Computing Surveys 55 (4): 82:1–36.
Wang, Lijing, Aniruddha Adiga, Jiangzhuo Chen, Adam Sadilek, Srinivasan Venkatramanan, and Madhav Marathe. 2022. CausalGNN: Causal-Based Graph Neural Networks for Spatio-Temporal Epidemic Forecasting.” Proceedings of the AAAI Conference on Artificial Intelligence 36 (11): 12191–99.
Wang, Sifan, Shyam Sankaran, and Paris Perdikaris. 2022. Respecting Causality Is All You Need for Training Physics-Informed Neural Networks.” arXiv.
Wang, Xingqiao, Xiaowei Xu, Weida Tong, Ruth Roberts, and Zhichao Liu. 2021. InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance.” Frontiers in Artificial Intelligence 4.
Wang, Yixin, and Michael I. Jordan. 2021. Desiderata for Representation Learning: A Causal Perspective.” arXiv:2109.03795 [Cs, Stat], September.
Wang, Yuhao, Liam Solus, Karren Dai Yang, and Caroline Uhler. 2017. Permutation-Based Causal Inference Algorithms with Interventions,” May.
Willig, Moritz, Matej Zečević, Devendra Singh Dhami, and Kristian Kersting. 2022. Can Foundation Models Talk Causality? arXiv.
Yang, Mengyue, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, and Jun Wang. 2020. CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models.” arXiv:2004.08697 [Cs, Stat], July.
Yoon, Jinsung. n.d. “E-RNN: Entangled Recurrent Neural Networks for Causal Prediction.”
Zhang, Kun, Mingming Gong, Petar Stojanov, Biwei Huang, Qingsong Liu, and Clark Glymour. 2020. Domain Adaptation as a Problem of Inference on Graphical Models.” In Advances in Neural Information Processing Systems. Vol. 33.
Zhang, Rui, Masaaki Imaizumi, Bernhard Schölkopf, and Krikamol Muandet. 2021. Maximum Moment Restriction for Instrumental Variable Regression.” arXiv:2010.07684 [Cs], February.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.