# Causal inference in highly parameterized ML

September 18, 2020 — October 24, 2023

algebra
graphical models
hidden variables
hierarchical models
how do science
machine learning
networks
neural nets
probability
statistics

Applying a causal graph structure in the challenging environment of a no-holds-barred nonparametric machine learning algorithm such as a neural net or its ilk. I am interested in this because it seems necessary and kind of obvious for handling things like dataset shift, but is often ignored. What is that about?

I do not know at the moment. This is a link salad for now.

See also the brain salad graphical models and supervised models.

## 1 Invariance approaches

Léon Bottou, From Causal Graphs to Causal Invariance:

For many problems, it’s difficult to even attempt drawing a causal graph. While structural causal models provide a complete framework for causal inference, it is often hard to encode known physical laws (such as Newton’s gravitation, or the ideal gas law) as causal graphs. In familiar machine learning territory, how does one model the causal relationships between individual pixels and a target prediction? This is one of the motivating questions behind the paper Invariant Risk Minimization (IRM). In place of structured graphs, the authors elevate invariance to the defining feature of causality.

He commends the Cloudera Fast Forward tutorial Causality for Machine Learning, which is a nice bit of applied work.

## 3 Double learning

Künzel et al. (2019) (HT Mike McKenna) looks interesting - it is a generic intervention estimator for ML methods (AFAICT this extends the double regression/instrumental variables approach.)

… We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the conditional average treatment effect (CATE) function. Metaalgorithms build on base algorithms—such as random forests (RFs), Bayesian additive regression trees (BARTs), or neural networks—to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a metaalgorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz-continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our X-learner can be used to target treatment regimes and to shed light on underlying mechanisms.

See also Mishler and Kennedy (2021). Maybe related: Shalit, Johansson, and Sontag (2017), Shi, Blei, and Veitch (2019).

## 4 Benchmarking

Detecting causal associations in time series datasets is a key challenge for novel insights into complex dynamical systems such as the Earth system or the human brain. Interactions in such systems present a number of major challenges for causal discovery techniques and it is largely unknown which methods perform best for which challenge.

The CauseMe platform provides ground truth benchmark datasets featuring different real data challenges to assess and compare the performance of causal discovery methods. The available benchmark datasets are either generated from synthetic models mimicking real challenges, or are real world data sets where the causal structure is known with high confidence. The datasets vary in dimensionality, complexity and sophistication.

## 5 Tooling

### 5.3 Causalnex

CausalNex is a Python library that uses Bayesian Networks to combine machine learning and domain expertise for causal reasoning. You can use CausalNex to uncover structural relationships in your data, learn complex distributions, and observe the effect of potential interventions.

### 5.4 caus2e

MLResearchAtOSRAM/cause2e: The cause2e package provides tools for performing an end-to-end causal analysis of your data.

The main contribution of cause2e is the integration of two established causal packages that have currently been separated and cumbersome to combine:

• Causal discovery methods from the py-causal package, which is a Python wrapper around parts of the Java TETRAD software. It provides many algorithms for learning the causal graph from data and domain knowledge.
• Causal reasoning methods from the DoWhy package, which is the current standard for the steps of a causal analysis starting from a known causal graph and data

TETRAD (source, tutorial) is a tool for discovering and visualising and calculating giant empirical DAGs, including general graphical inference and causality. It’s written by eminent causality inference people.

Tetrad is a program which creates, simulates data from, estimates, tests, predicts with, and searches for causal and statistical models. The aim of the program is to provide sophisticated methods in a friendly interface requiring very little statistical sophistication of the user and no programming knowledge. It is not intended to replace flexible statistical programming systems such as Matlab, Splus or R. Tetrad is freeware that performs many of the functions in commercial programs such as Netica, Hugin, LISREL, EQS and other programs, and many discovery functions these commercial programs do not perform. …

The Tetrad programs describe causal models in three distinct parts or stages: a picture, representing a directed graph specifying hypothetical causal relations among the variables; a specification of the family of probability distributions and kinds of parameters associated with the graphical model; and a specification of the numerical values of those parameters.

py-causal is a wrapper around TETRAD for python, and R-causal for R.

## 7 References

Arjovsky, Bottou, Gulrajani, et al. 2020.
Athey, and Wager. 2019. arXiv:1902.07409 [Stat].
Bareinboim, Correa, Ibeling, et al. 2022. In Probabilistic and Causal Inference: The Works of Judea Pearl.
Besserve, Mehrjou, Sun, et al. 2019. In arXiv:1812.03253 [Cs, Stat].
Bishop. 2021. Frontiers in Psychology.
Black, Koepke, Kim, et al. 2023. SSRN Scholarly Paper.
Bongers, Forré, Peters, et al. 2020. arXiv:1611.06221 [Cs, Stat].
Bongers, and Mooij. 2018. arXiv:1803.08784 [Cs, Stat].
Bongers, Peters, Schölkopf, et al. 2016. arXiv:1611.06221 [Cs, Stat].
Christiansen, Pfister, Jakobsen, et al. 2020.
Fernández-Loría, and Provost. 2021. arXiv:2104.04103 [Cs, Stat].
Friedrich, Antes, Behr, et al. 2020. arXiv:2009.09070 [Cs].
Gendron, Witbrock, and Dobbie. 2023.
Goyal, Lamb, Hoffmann, et al. 2020. arXiv:1909.10893 [Cs, Stat].
Hartford, Lewis, Leyton-Brown, et al. 2017. In Proceedings of the 34th International Conference on Machine Learning.
Huang, Fu, and Franzke. 2020. Chaos: An Interdisciplinary Journal of Nonlinear Science.
Johnson, Duvenaud, Wiltschko, et al. 2016. In Advances in Neural Information Processing Systems 29.
Jordan, Wang, and Zhou. 2022.
Kaddour, Lynch, Liu, et al. 2022.
Karimi, Barthe, Schölkopf, et al. 2021.
Karimi, Muandet, Kornblith, et al. 2022.
Kirk, Zhang, Grefenstette, et al. 2023. Journal of Artificial Intelligence Research.
Kocaoglu, Snyder, Dimakis, et al. 2017. arXiv:1709.02023 [Cs, Math, Stat].
Kosoy, Chan, Liu, et al. 2022.
Künzel, Sekhon, Bickel, et al. 2019. Proceedings of the National Academy of Sciences.
Lagemann, Lagemann, Taschler, et al. 2023. Nature Machine Intelligence.
Lattimore. 2017.
Leeb, Lanzillotta, Annadani, et al. 2021. arXiv:2006.07796 [Cs, Stat].
Li, Dai, Shangguan, et al. 2022. Journal of Hydrometeorology.
Locatello, Bauer, Lucic, et al. 2019. arXiv:1811.12359 [Cs, Stat].
Locatello, Poole, Raetsch, et al. 2020. In Proceedings of the 37th International Conference on Machine Learning.
Louizos, Shalit, Mooij, et al. 2017. In Advances in Neural Information Processing Systems 30.
Lu, Wu, Hernández-Lobato, et al. 2021. arXiv:2102.12353 [Cs, Stat].
Mehta, Albiero, Chen, et al. 2022.
Melnychuk, Frauen, and Feuerriegel. 2022.
Mishler, and Kennedy. 2021. arXiv:2109.00173 [Cs, Stat].
Mooij, Peters, Janzing, et al. 2016. Journal of Machine Learning Research.
Ng, Fang, Zhu, et al. 2020. arXiv:1910.08527 [Cs, Stat].
Ng, Zhu, Chen, et al. 2019. In Advances In Neural Information Processing Systems.
Ortega, Kunesch, Delétang, et al. 2021. arXiv:2110.10819 [Cs].
Pawlowski, Coelho de Castro, and Glocker. 2020. In Advances in Neural Information Processing Systems.
Peters, Bühlmann, and Meinshausen. 2016. Journal of the Royal Statistical Society Series B: Statistical Methodology.
Peters, Janzing, and Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. Adaptive Computation and Machine Learning Series.
Poulos, and Zeng. 2021. Journal of the Royal Statistical Society Series C: Applied Statistics.
Rakesh, Guo, Moraffah, et al. 2018. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. CIKM ’18.
Richardson, and Robins. 2013.
Roscher, Bohn, Duarte, et al. 2020. IEEE Access.
Rotnitzky, and Smucler. 2020. Journal of Machine Learning Research.
Rubenstein, Bongers, Schölkopf, et al. 2018. In Uncertainty in Artificial Intelligence.
Runge, Bathiany, Bollt, et al. 2019. Nature Communications.
Schölkopf. 2022. In Probabilistic and Causal Inference: The Works of Judea Pearl.
Schölkopf, Locatello, Bauer, et al. 2021. Proceedings of the IEEE.
Shalit, Johansson, and Sontag. 2017. arXiv:1606.03976 [Cs, Stat].
Shi, Blei, and Veitch. 2019. In Proceedings of the 33rd International Conference on Neural Information Processing Systems.
Simchoni, and Rosset. 2023.
Tigas, Annadani, Jesson, et al. 2022. Advances in Neural Information Processing Systems.
Veitch, and Zaveri. 2020.
Vowels, Camgoz, and Bowden. 2022. ACM Computing Surveys.
Wang, Lijing, Adiga, Chen, et al. 2022. Proceedings of the AAAI Conference on Artificial Intelligence.
Wang, Yixin, and Jordan. 2021. arXiv:2109.03795 [Cs, Stat].
Wang, Sifan, Sankaran, and Perdikaris. 2022.
Wang, Yuhao, Solus, Yang, et al. 2017.
Wang, Xingqiao, Xu, Tong, et al. 2021. Frontiers in Artificial Intelligence.
Willig, Zečević, Dhami, et al. 2022.
Yang, Liu, Chen, et al. 2020. arXiv:2004.08697 [Cs, Stat].
Yoon. n.d. “E-RNN: Entangled Recurrent Neural Networks for Causal Prediction.”
Zhang, Kun, Gong, Stojanov, et al. 2020. In Advances in Neural Information Processing Systems.
Zhang, Rui, Imaizumi, Schölkopf, et al. 2021. arXiv:2010.07684 [Cs].
Zhang, Jiaqi, Jennings, Zhang, et al. 2023.
Zhou, Xie, Hao, et al. 2023.