Independence, conditional, statistical

2016-04-21 — 2020-09-12

Wherein the Graphoid Axioms Are Laid Out and Modern Nonparametric Tests Such as Chatterjee’s Ξ and Kernel Conditional Independence Methods Are Described, and Connections to Graphical Models and Model Selection Are Sketched.

algebra

functional analysis

graphical models

metrics

model selection

networks

probability

statistics

Conditional independence between random variables is a special relationship. As seen in inference directed graphical models.

Connection with model selection, in the sense that accepting enough true hypotheses leaves you with a residual independent of the predictors.

Figure 2: David Butler on independence venn diagrams

1 As an algebra

The “graphoid axioms” (Dawid 1979, 1980)

\[\begin{aligned} \text{Symmetry: } & (X \perp\!\!\!\perp Y \mid Z) \implies (Y \perp\!\!\!\perp X \mid Z) \\ \text{Decomposition: } & (X \perp\!\!\!\perp YW \mid Z) \implies (X \perp\!\!\!\perp Y \mid Z) \\ \text{Weak union: } & (X \perp\!\!\!\perp YW \mid Z) \implies (X \perp\!\!\!\perp Y \mid ZW) \\ \text{Contraction: } & (X \perp\!\!\!\perp Y \mid Z) \,\wedge\, (X \perp\!\!\!\perp W \mid ZY)\implies (X \perp\!\!\!\perp YW \mid Z) \\ \text{Intersection: } & (X \perp\!\!\!\perp W \mid ZY) \,\wedge\, (X \perp\!\!\!\perp Y \mid ZW)\implies (X \perp\!\!\!\perp YW \mid Z) \\ \end{aligned}\] tell us what operations independence supports (nb the Intersection axiom requires that there are no probability zero events). If you map all the independence relationships between some random variables you are doing graphical models.

2 Tests

In parametric models we can say that if you don’t merely want to know whether two things are dependent, but how dependent they are, you may want to calculate a probability metric between their joint and product distributions. In the case of empirical observations and nonparametric independence this is presumably between the joint and product empirical distributions. If the distribution of the empirical statistic

Figure 3: Researcher inferring suspect d-separation a node in an intervention

2.1 Traditional tests

There are special cases where this is easy, e.g. in binary data we have χ² tests; for Gaussian variables it’s the same as correlation, so the problem is simply one of covariance estimates. Generally, likelihood tests can easily give us what is effectively a test of this in estimation problems in exponential families. (c&c Basu’s lemma.)

3 Chatterjee ξ

Modernized Spearman ρ. Looks like a contender as a universal replacement for a measure of (strength of) dependence (Azadkia and Chatterjee 2019; Chatterjee 2020). There seems to be a costly scaling of \(n \log n\) or even \(n^2\) in data size? Not clear. The method is remarkably simple (see the source code).

🚧TODO🚧: Deb, Ghosal, and Sen (2020) claims to have extended and generalised this and unified it with Dette, Siburg, and Stoimenov (2013).

3.1 Copula tests

If we know the copula and variables are monotonically related we know the dependence structure already. Um, Dette, Siburg, and Stoimenov (2013). Surely there are others?

3.2 Information criteria

Information criteria effectively do this.

3.3 Kernel distribution embedding tests

I’m interested in the nonparametric conditional independence tests of Gretton et al. (2008), using kernel tricks, although I don’t quite get how you conditionalize them.

RCIT (Strobl, Zhang, and Visweswaran (2017)) implements an approximate kernel distribution embedding conditional independence test via kernel approximation:

Constraint-based causal discovery (CCD) algorithms require fast and accurate conditional independence (CI) testing. The Kernel Conditional Independence Test (KCIT) is currently one of the most popular CI tests in the non-parametric setting, but many investigators cannot use KCIT with large datasets because the test scales cubicly with sample size. We therefore devise two relaxations called the Randomized Conditional Independence Test (RCIT) and the Randomized conditional Correlation Test (RCoT) which both approximate KCIT by utilising random Fourier features. In practice, both of the proposed tests scale linearly with sample size and return accurate p-values much faster than KCIT in the large sample size context. CCD algorithms run with RCIT or RCoT also return graphs at least as accurate as the same algorithms run with KCIT but with large reductions in run time.

ITE toolbox

4 Stein Discrepancy

Kernelized Stein Discrepancy is also IIRC a different kernelized test.

5 References

Azadkia, and Chatterjee. 2019. “A Simple Measure of Conditional Dependence.” arXiv:1910.12327 [Cs, Math, Stat].

Baba, Shibata, and Sibuya. 2004. “Partial Correlation and Conditional Correlation as Measures of Conditional Independence.” Australian & New Zealand Journal of Statistics.

Cassidy, Rae, and Solo. 2015. “Brain Activity: Connectivity, Sparsity, and Mutual Information.” IEEE Transactions on Medical Imaging.

Chatterjee. 2020. “A New Coefficient of Correlation.” arXiv:1909.10140 [Math, Stat].

Christiano, Neyman, and Xu. 2022. “Formalizing the Presumption of Independence.”

Daniušis, Juneja, Kuzma, et al. 2022. “Measuring Statistical Dependencies via Maximum Norm and Characteristic Functions.”

Dawid. 1979. “Conditional Independence in Statistical Theory.” Journal of the Royal Statistical Society. Series B (Methodological).

———. 1980. “Conditional Independence for Statistical Operations.” The Annals of Statistics.

de Campos. 2006. “A Scoring Function for Learning Bayesian Networks Based on Mutual Information and Conditional Independence Tests.” Journal of Machine Learning Research.

Deb, Ghosal, and Sen. 2020. “Measuring Association on Topological Spaces Using Kernels and Geometric Graphs.” arXiv:2010.01768 [Math, Stat].

Dette, Siburg, and Stoimenov. 2013. “A Copula-Based Non-Parametric Measure of Regression Dependence.” Scandinavian Journal of Statistics.

Embrechts, Lindskog, and McNeil. 2003. “Modelling Dependence with Copulas and Applications to Risk Management.” Handbook of Heavy Tailed Distributions in Finance.

Geenens, and de Micheaux. 2018. “The Hellinger Correlation.” arXiv:1810.10276 [Math, Stat].

Gretton, Fukumizu, Teo, et al. 2008. “A Kernel Statistical Test of Independence.” In Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference.

Jebara, Kondor, and Howard. 2004. “Probability Product Kernels.” Journal of Machine Learning Research.

Jiang, Aragam, and Veitch. 2023. “Uncovering Meanings of Embeddings via Partial Orthogonality.”

Kac. 1959. Statistical Independence in Probability, Analysis and Number Theory. The Carus Mathematical Monographs 12.

Lederer. 2016. “Graphical Models for Discrete and Continuous Data.” arXiv:1609.05551 [Math, Stat].

Ma, Lewis, and Kleijn. 2020. “The HSIC Bottleneck: Deep Learning Without Back-Propagation.” Proceedings of the AAAI Conference on Artificial Intelligence.

Muandet, Fukumizu, Sriperumbudur, et al. 2017. “Kernel Mean Embedding of Distributions: A Review and Beyond.” Foundations and Trends® in Machine Learning.

Pfister, Bühlmann, Schölkopf, et al. 2018. “Kernel-Based Tests for Joint Independence.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Sadeghi. 2020. “On Finite Exchangeability and Conditional Independence.” Electronic Journal of Statistics.

Sejdinovic, Sriperumbudur, Gretton, et al. 2012. “Equivalence of Distance-Based and RKHS-Based Statistics in Hypothesis Testing.” The Annals of Statistics.

Sheng, and Sriperumbudur. 2023. “On Distance and Kernel Measures of Conditional Dependence.” Journal of Machine Learning Research.

Song, Fukumizu, and Gretton. 2013. “Kernel Embeddings of Conditional Distributions: A Unified Kernel Framework for Nonparametric Inference in Graphical Models.” IEEE Signal Processing Magazine.

Song, Huang, Smola, et al. 2009. “Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems.” In Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09.

Spirtes, and Meek. 1995. “Learning Bayesian Networks with Discrete Variables from Data.” In Proceedings of the First International Conference on Knowledge Discovery and Data Mining.

Sriperumbudur, Fukumizu, Gretton, et al. 2012. “On the Empirical Estimation of Integral Probability Metrics.” Electronic Journal of Statistics.

Strobl, Zhang, and Visweswaran. 2017. “Approximate Kernel-Based Conditional Independence Tests for Fast Non-Parametric Causal Discovery.” arXiv:1702.03877 [Stat].

Studený. 2005. Probabilistic Conditional Independence Structures. Information Science and Statistics.

———. 2016. “Basic Facts Concerning Supermodular Functions.” arXiv:1612.06599 [Math, Stat].

Su, and White. 2007. “A Consistent Characteristic Function-Based Test for Conditional Independence.” Journal of Econometrics.

Székely, and Rizzo. 2009. “Brownian Distance Covariance.” The Annals of Applied Statistics.

Székely, Rizzo, and Bakirov. 2007. “Measuring and Testing Dependence by Correlation of Distances.” The Annals of Statistics.

Talagrand. 1996. “A New Look at Independence.” The Annals of Probability.

Thanei, Shah, and Shah. 2016. “The Xyz Algorithm for Fast Interaction Search in High-Dimensional Data.” Arxiv.

Xu, Zhao, Song, et al. 2019. “A Theory of Usable Information Under Computational Constraints.” In.

Yang, and Pan. 2015. “Independence Test for High Dimensional Data Based on Regularized Canonical Correlation Coefficients.” The Annals of Statistics.

Yao, Zhang, and Shao. 2016. “Testing Mutual Independence in High Dimension via Distance Covariance.” arXiv:1609.09380 [Stat].

Zhang, Qinyi, Filippi, Gretton, et al. 2016. “Large-Scale Kernel Methods for Independence Testing.” arXiv:1606.07892 [Stat].

Zhang, Kun, Peters, Janzing, et al. 2012. “Kernel-Based Conditional Independence Test and Application in Causal Discovery.” arXiv:1202.3775 [Cs, Stat].