# Independence, conditional, statistical

Conditional independence between random variables is a special relationship. As seen in inference directed graphical models.

Connection with model selection, in the sense that accepting enough true hypotheses leaves you with a residual independent of the predictors. (🏗 clarify.)

## As an algebra

The “graphoid axioms

\begin{aligned} \text{Symmetry: } & (X \perp\!\!\!\perp Y \mid Z) \implies (Y \perp\!\!\!\perp X \mid Z) \\ \text{Decomposition: } & (X \perp\!\!\!\perp YW \mid Z) \implies (X \perp\!\!\!\perp Y \mid Z) \\ \text{Weak union: } & (X \perp\!\!\!\perp YW \mid Z) \implies (X \perp\!\!\!\perp Y \mid ZW) \\ \text{Contraction: } & (X \perp\!\!\!\perp Y \mid Z) \,\wedge\, (X \perp\!\!\!\perp W \mid ZY)\implies (X \perp\!\!\!\perp YW \mid Z) \\ \text{Intersection: } & (X \perp\!\!\!\perp W \mid ZY) \,\wedge\, (X \perp\!\!\!\perp Y \mid ZW)\implies (X \perp\!\!\!\perp YW \mid Z) \\ \end{aligned} tell us how what operations independence supports (nb the Intersection axiom requires that there are no probability zero events). If you map all the independence relationships between some random variables you are doing graphical models.

## Tests

In parametric model we can say that If you don’t merely want to know whether two things are dependent, but how dependent they are, you may want to calculate a probability metric between their joint and product distributions. In the case of empirical observations and nonparametric independence this is presumably between the joint and product empirical distributions. If the distribution of the empirical statistic

Researcher inferring suspect d-separation a node in an intervention

There are special cases where this is easy, e.g. in binary data we have χ² tests; for Gaussian variables it’s the same as correlation, so the problem is simply one of covariance estimates. Generally, likelihood tests can easily give us what is effectively a test of this in estimation problems in exponential families. (c&c Basu’s lemma.)

## Chatterjee ξ

Modernized Spearman ρ. Looks like a contender as a universal replacement for a measure of (strength of) dependence . There seems to be a costly scaling of $$n \log n$$ or even $$n^2$$ in data size? Not clear. The method is remarkably simple (see the source code).

TODO: Deb, Ghosal, and Sen (2020) claims to have extended and generalised this and unified it with Dette, Siburg, and Stoimenov (2013).

### Copula tests

If we know the copula and variables are monotonically related we know the dependence structure already. Um, Dette, Siburg, and Stoimenov (2013). Surely there are others?

### Information criteria

Information criteria effectively do this. (🏗 clarify.)

### Kernel distribution embedding tests

I’m interested in the nonparametric conditional independence tests of Gretton et al. (2008), using kernel tricks, although I don’t quite get how you conditionalise them.

RCIT (Strobl, Zhang, and Visweswaran (2017)) implements an approximate kernel distribution embedding conditional independence test via kernel approximation:

Constraint-based causal discovery (CCD) algorithms require fast and accurate conditional independence (CI) testing. The Kernel Conditional Independence Test (KCIT) is currently one of the most popular CI tests in the non-parametric setting, but many investigators cannot use KCIT with large datasets because the test scales cubicly with sample size. We therefore devise two relaxations called the Randomized Conditional Independence Test (RCIT) and the Randomized conditional Correlation Test (RCoT) which both approximate KCIT by utilizing random Fourier features. In practice, both of the proposed tests scale linearly with sample size and return accurate p-values much faster than KCIT in the large sample size context. CCD algorithms run with RCIT or RCoT also return graphs at least as accurate as the same algorithms run with KCIT but with large reductions in run time.

ITE toolbox

## Stein Discrepancy

Kernelized Stein Discrepancy is also IIRC a differnt kernelized test.

## References

Azadkia, Mona, and Sourav Chatterjee. 2019. arXiv:1910.12327 [Cs, Math, Stat], December.
Baba, Kunihiro, Ritei Shibata, and Masaaki Sibuya. 2004. Australian & New Zealand Journal of Statistics 46 (4): 657–64.
Campos, Luis M. de. 2006. Journal of Machine Learning Research 7: 2149–87.
Cassidy, Ben, Caroline Rae, and Victor Solo. 2015. IEEE Transactions on Medical Imaging 34 (4): 846–60.
Chatterjee, Sourav. 2020. arXiv:1909.10140 [Math, Stat], January.
Daniušis, Povilas, Shubham Juneja, Lukas Kuzma, and Virginijus Marcinkevičius. 2022. arXiv.
Dawid, A. Philip. 1979. Journal of the Royal Statistical Society. Series B (Methodological) 41 (1): 1–31.
———. 1980. The Annals of Statistics 8 (3): 598–617.
Deb, Nabarun, Promit Ghosal, and Bodhisattva Sen. 2020. arXiv:2010.01768 [Math, Stat], October.
Dette, Holger, Karl F. Siburg, and Pavel A. Stoimenov. 2013. Scandinavian Journal of Statistics 40 (1): 21–41.
Embrechts, Paul, Filip Lindskog, and Alexander J McNeil. 2003. Handbook of Heavy Tailed Distributions in Finance 8 (329-384): 1.
Geenens, Gery, and Pierre Lafaye de Micheaux. 2018. arXiv:1810.10276 [Math, Stat], October.
Gretton, Arthur, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Schölkopf, and Alexander J Smola. 2008. In Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference. Cambridge, MA: MIT Press.
Jebara, Tony, Risi Kondor, and Andrew Howard. 2004. Journal of Machine Learning Research 5 (December): 819–44.
Kac, Mark. 1959. Statistical Independence in Probability, Analysis and Number Theory. Nachdr. The Carus Mathematical Monographs 12. Washington, DC: Math. Assoc. of America.
Lederer, Johannes. 2016. arXiv:1609.05551 [Math, Stat], September.
Muandet, Krikamol, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. 2017. Foundations and Trends® in Machine Learning 10 (1-2): 1–141.
Pfister, Niklas, Peter Bühlmann, Bernhard Schölkopf, and Jonas Peters. 2018. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80 (1): 5–31.
Sadeghi, Kayvan. 2020. Electronic Journal of Statistics 14 (2): 2773–97.
Sejdinovic, Dino, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. 2012. The Annals of Statistics 41 (5): 2263–91.
Song, Le, Kenji Fukumizu, and Arthur Gretton. 2013. IEEE Signal Processing Magazine 30 (4): 98–111.
Song, Le, Jonathan Huang, Alex Smola, and Kenji Fukumizu. 2009. In Proceedings of the 26th Annual International Conference on Machine Learning, 961–68. ICML ’09. New York, NY, USA: ACM.
Spirtes, Peter, and Christopher Meek. 1995. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining.
Sriperumbudur, Bharath K., Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, and Gert R. G. Lanckriet. 2012. Electronic Journal of Statistics 6: 1550–99.
Strobl, Eric V., Kun Zhang, and Shyam Visweswaran. 2017. arXiv:1702.03877 [Stat], February.
Studený, Milan. 2005. Probabilistic Conditional Independence Structures. Information Science and Statistics. London: Springer.
———. 2016. arXiv:1612.06599 [Math, Stat], December.
Su, Liangjun, and Halbert White. 2007. Journal of Econometrics 141 (2): 807–34.
Székely, Gábor J., and Maria L. Rizzo. 2009. The Annals of Applied Statistics 3 (4): 1236–65.
Székely, Gábor J., Maria L. Rizzo, and Nail K. Bakirov. 2007. The Annals of Statistics 35 (6): 2769–94.
Talagrand, Michel. 1996. The Annals of Probability 24 (1): 1–34.
Thanei, Gian-Andrea, Nicolai Meinshausen Shah Rajen D., and Rajen D. Shah. 2016. Arxiv 20 (9): 846–51.
Yang, Yanrong, and Guangming Pan. 2015. The Annals of Statistics 43 (2).
Yao, Shun, Xianyang Zhang, and Xiaofeng Shao. 2016. arXiv:1609.09380 [Stat], September.
Zhang, Kun, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. arXiv:1202.3775 [Cs, Stat], February.
Zhang, Qinyi, Sarah Filippi, Arthur Gretton, and Dino Sejdinovic. 2016. arXiv:1606.07892 [Stat], June.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.