External validity

When does what I learn on one data set apply to another?

2020-10-16 — 2023-02-15

algebra

graphical models

how do science

machine learning

networks

probability

statistics

Suspiciously similar content

For me, it seems natural to consider learning well-factored causal graphical models containing the necessary interaction effects as the platonic ideal here, and everything else is just an approximation to that.

Although maybe I should be thinking about feedback effects also — if everyone uses my algorithms, does this change the environment in which my algorithm is operating? For example, traffic routing algorithms clearly do.

The reason this is a hot topic in neural nets, I suspect, is that it is convenient for massive, low-human-effort neural networks to ignore graphical structure to get predictively good results from regressions in observational data by ignoring that structure, and this leads us into strife when the situation changes. In the IRM section, there is a spicy take by Ermin Orhan that reframes this to say: The problem is you if you don’t have so much data that you can integrate out all such difficulties.

To recover the causal consistency in a black-box model is even more tedious than in a classical one. Also, it fits the social conventions of neural network research to reinvent methods to fix such problems without reference to previous conventions, for better and worse.

One thing that the machine learning setup gives us is an additional emphasis: external validity, the statistical framing, would ask you whether the model you have learned is still useful on new data. The transfer learning setup invites us to consider if we can transfer some of the computational effort from learning on one data set to learning on a new dataset, and if so, how much. Maybe that is a useful insight?

This connects also to semi-supervised learning and fairness, argues (Schölkopf et al. 2012; Schölkopf 2022).

Possibly the same underlying idea, we could argue that interaction effects are probably what we want to learn.

1 Standard graphical models

We can just try some basic graphical model technology and see how far we get. If the right independences are enforced, presumably we are doing something not too far from learning a transferable model? Or, if we work out that the necessary parameters are not identifiable, then we discover that we cannot, in fact, learn a transferable model, right? (But maybe we can learn a somewhat transferable model?) I guess the key weakness is that graphical models will miss some types of transferability, specifically, independences that are dependent on particular values of the nodes, so this might be less powerful.

2 External validity in policy

I have lots of ideas about policy for the world, and I think that some of the ideas are good because of some mix of scientific research and personal experience.¹ So let us suppose that I am broadly sympathetic to some policy instrument (state ownership of power utilities? diversity quotas in hiring? etc.) because I have seen them work in the past. The question is, how universally should I be in favour of that policy? How do I find out what are the circumstances that make these policy instruments achieve my desired outcomes? A recent example from a workplace I was in: Presumably, a diversity quota requiring a certain percentage of the workforce to be, say, women, would be pointless in a society with perfect gender equality, and ineffectual in a society that has failed to train any women at all with the required skills. Most societies will not be at either of those extremes, but what is the range of gender inequity where the hiring quotas would be a useful policy intervention? What other predictors will change their effectiveness? This policy is not a good idea in and of itself but rather in a particular context. Burying that essential context is common in debates observationally.

Rather than universal policy prescriptions, it is worth wondering what specificity policies have and constantly checking if they apply here.

3 Learning under covariate shift

See Covariate shift.

4 Incoming

There’s more to data than distributions

5 References

Arjovsky, Bottou, Gulrajani, et al. 2020. “Invariant Risk Minimization.”

Bareinboim. 2014. “Generalizability in Causal Inference: Theory and Algorithms.”

Bareinboim, and Pearl. 2013. “A General Algorithm for Deciding Transportability of Experimental Results.” Journal of Causal Inference.

———. 2014. “Transportability from Multiple Environments with Limited Experiments: Completeness Results.” In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1. NIPS’14.

———. 2016. “Causal Inference and the Data-Fusion Problem.” Proceedings of the National Academy of Sciences.

Bongers, Forré, Peters, et al. 2021. “Foundations of Structural Causal Models with Cycles and Latent Variables.” The Annals of Statistics.

Bühlmann. 2020. “Invariance, Causality and Robustness.” Statistical Science.

Christiansen, Pfister, Jakobsen, et al. 2020. “A Causal Framework for Distribution Generalization.”

D’Amour, Heller, Moldovan, et al. 2020. “Underspecification Presents Challenges for Credibility in Modern Machine Learning.” arXiv:2011.03395 [Cs, Stat].

Deaton, and Cartwright. 2016. “Understanding and Misunderstanding Randomized Controlled Trials.” Working Paper 22595.

Degtiar, and Rose. 2023. “A Review of Generalizability and Transportability.” Annual Review of Statistics and Its Application.

Fernández-Loría, and Provost. 2021. “Causal Decision Making and Causal Effect Estimation Are Not the Same… and Why It Matters.” arXiv:2104.04103 [Cs, Stat].

Gigerenzer. n.d. “We Need to Think More about How We Conduct Research.” Behavioral and Brain Sciences.

Hoffimann, Zortea, de Carvalho, et al. 2021. “Geostatistical Learning: Challenges and Opportunities.” Frontiers in Applied Mathematics and Statistics.

Kilbertus, Rojas Carulla, Parascandolo, et al. 2017. “Avoiding Discrimination Through Causal Reasoning.” In Advances in Neural Information Processing Systems 30.

Koh, Sagawa, Marklund, et al. 2021. “WILDS: A Benchmark of in-the-Wild Distribution Shifts.” arXiv:2012.07421 [Cs].

Kulinski, and Inouye. 2022. “Towards Explaining Distribution Shifts.”

Künzel, Sekhon, Bickel, et al. 2019. “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences.

Meinshausen. 2018. “Causality from a Distributional Robustness Point of View.” In 2018 IEEE Data Science Workshop (DSW).

Olteanu, Castillo, Diaz, et al. 2019. “Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries.” Frontiers in Big Data.

Pearl, and Bareinboim. 2014. “External Validity: From Do-Calculus to Transportability Across Populations.” Statistical Science.

Peters, Janzing, and Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. Adaptive Computation and Machine Learning Series.

Quiñonero-Candela. 2009. Dataset Shift in Machine Learning.

Ramchandran, and Mukherjee. 2021. “On Ensembling Vs Merging: Least Squares and Random Forests Under Covariate Shift.” arXiv:2106.02589 [Math, Stat].

Rothenhäusler, Meinshausen, Bühlmann, et al. 2020. “Anchor Regression: Heterogeneous Data Meets Causality.” arXiv:1801.06229 [Stat].

Rubenstein, Bongers, Schölkopf, et al. 2018. “From Deterministic ODEs to Dynamic Structural Causal Models.” In Uncertainty in Artificial Intelligence.

Runge, Bathiany, Bollt, et al. 2019. “Inferring Causation from Time Series in Earth System Sciences.” Nature Communications.

Schölkopf. 2022. “Causality for Machine Learning.” In Probabilistic and Causal Inference: The Works of Judea Pearl.

Schölkopf, Hogg, Wang, et al. 2015. “Removing Systematic Errors for Exoplanet Search via Latent Causes.” arXiv:1505.03036 [Astro-Ph, Stat].

Schölkopf, Janzing, Peters, et al. 2012. “On Causal and Anticausal Learning.” In ICML 2012.

Schram. 2005. “Artificiality: The Tension Between Internal and External Validity in Economic Experiments.” Journal of Economic Methodology.

Shi, Blei, and Veitch. 2019. “Adapting Neural Networks for the Estimation of Treatment Effects.” In Proceedings of the 33rd International Conference on Neural Information Processing Systems.

Subbaswamy, Schulam, and Saria. 2019. “Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport.” In The 22nd International Conference on Artificial Intelligence and Statistics.

Tibshirani, Foygel Barber, Candes, et al. 2019. “Conformal Prediction Under Covariate Shift.” In Advances in Neural Information Processing Systems.

Veitch, and Zaveri. 2020. “Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding.”

Verma, Dickerson, and Hines. 2020. “Counterfactual Explanations for Machine Learning: A Review.” In.

Wang, and Jordan. 2021. “Desiderata for Representation Learning: A Causal Perspective.” arXiv:2109.03795 [Cs, Stat].

Yamazaki, View Profile, Kawanabe, et al. 2007. “Asymptotic Bayesian Generalization Error When Training and Test Distributions Are Different.” Proceedings of the 24th International Conference on Machine Learning, ACM Other conferences,.

Yamazaki, and Watanabe. 2008. “Experimental Bayesian Generalization Error of Non-Regular Models Under Covariate Shift.” In Neural Information Processing.

Footnotes

Although I realistically copied some ideas from my acquaintances, but maybe even those ideas have the same sort of empirical basis. Let us optimistically assume so for now 🤞.↩︎