Connection with model selection, in the sense that accepting enough true hypotheses leaves you with a residual independent of the predictors. (🏗 clarify.)
If you don’t merely want to know whether two things are dependent, but how far apart they are, you may want to estimate a probability metric from data.
There are special cases where this is easy, e.g. in binary data we have Chi^2 tests; for Gaussian variables it’s the same as correlation, so the problem is simply one of covariance estimates. Generally, likelihood tests can easily give us what is effectively a test of this in estimation problems in exponential families. (c&c Basu’s lemma.)
If we know the copula and variables are monotonically related we know the dependence structure already.
Information criteria effectively do this. (🏗 clarify.)
Kernel distribution embedding tests
Constraint-based causal discovery (CCD) algorithms require fast and accurate conditional independence (CI) testing. The Kernel Conditional Independence Test (KCIT) is currently one of the most popular CI tests in the non-parametric setting, but many investigators cannot use KCIT with large datasets because the test scales cubicly with sample size. We therefore devise two relaxations called the Randomized Conditional Independence Test (RCIT) and the Randomized conditional Correlation Test (RCoT) which both approximate KCIT by utilizing random Fourier features. In practice, both of the proposed tests scale linearly with sample size and return accurate p-values much faster than KCIT in the large sample size context. CCD algorithms run with RCIT or RCoT also return graphs at least as accurate as the same algorithms run with KCIT but with large reductions in run time.
Baba, Kunihiro, Ritei Shibata, and Masaaki Sibuya. 2004. “Partial Correlation and Conditional Correlation as Measures of Conditional Independence.” Australian & New Zealand Journal of Statistics 46 (4): 657–64. https://doi.org/10.1111/j.1467-842X.2004.00360.x.
Campos, Luis M. de. 2006. “A Scoring Function for Learning Bayesian Networks Based on Mutual Information and Conditional Independence Tests.” Journal of Machine Learning Research 7: 2149–87. http://jmlr.csail.mit.edu/papers/volume7/decampos06a/decampos06a.pdf.
Cassidy, Ben, Caroline Rae, and Victor Solo. 2015. “Brain Activity: Connectivity, Sparsity, and Mutual Information.” IEEE Transactions on Medical Imaging 34 (4): 846–60. https://doi.org/10.1109/TMI.2014.2358681.
Embrechts, Paul, Filip Lindskog, and Alexander J McNeil. 2003. “Modelling Dependence with Copulas and Applications to Risk Management.” Handbook of Heavy Tailed Distributions in Finance 8 (329-384): 1. https://people.math.ethz.ch/~embrecht/ftp/copchapter.pdf.
Geenens, Gery, and Pierre Lafaye de Micheaux. 2018. “The Hellinger Correlation,” October. http://arxiv.org/abs/1810.10276.
Gretton, Arthur, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Schölkopf, and Alexander J Smola. 2008. “A Kernel Statistical Test of Independence.” In Advances in Neural Information Processing Systems 20: Proceedings of the 2007 Conference. Cambridge, MA: MIT Press. http://eprints.pascal-network.org/archive/00004335/.
Jebara, Tony, Risi Kondor, and Andrew Howard. 2004. “Probability Product Kernels.” Journal of Machine Learning Research 5 (December): 819–44. http://dl.acm.org/citation.cfm?id=1005332.1016786.
Kac, Mark. 1959. Statistical Independence in Probability, Analysis and Number Theory. Nachdr. The Carus Mathematical Monographs 12. Washington, DC: Math. Assoc. of America.
Lederer, Johannes. 2016. “Graphical Models for Discrete and Continuous Data,” September. http://arxiv.org/abs/1609.05551.
Muandet, Krikamol, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. 2017. “Kernel Mean Embedding of Distributions: A Review and Beyond.” Foundations and Trends® in Machine Learning 10 (1-2): 1–141. https://doi.org/10.1561/2200000060.
Sejdinovic, Dino, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. 2012. “Equivalence of Distance-Based and RKHS-Based Statistics in Hypothesis Testing.” The Annals of Statistics 41 (5): 2263–91. https://doi.org/10.1214/13-AOS1140.
Song, Le, Jonathan Huang, Alex Smola, and Kenji Fukumizu. 2009. “Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems.” In Proceedings of the 26th Annual International Conference on Machine Learning, 961–68. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553497.
Spirtes, Peter, and Christopher Meek. 1995. “Learning Bayesian Networks with Discrete Variables from Data.” In Proceedings of the First International Conference on Knowledge Discovery and Data Mining. https://www.researchgate.net/profile/Peter_Spirtes/publication/221653127_Learning_Bayesian_Networks_with_Discrete_Variables_from_Data/links/0deec52a209f36538a000000.pdf.
Sriperumbudur, Bharath K., Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, and Gert R. G. Lanckriet. 2012. “On the Empirical Estimation of Integral Probability Metrics.” Electronic Journal of Statistics 6: 1550–99. https://doi.org/10.1214/12-EJS722.
Strobl, Eric V., Kun Zhang, and Shyam Visweswaran. 2017. “Approximate Kernel-Based Conditional Independence Tests for Fast Non-Parametric Causal Discovery,” February. http://arxiv.org/abs/1702.03877.
Studený, Milan. 2005. Probabilistic Conditional Independence Structures. Information Science and Statistics. London: Springer.
———. 2016. “Basic Facts Concerning Supermodular Functions,” December. http://arxiv.org/abs/1612.06599.
Su, Liangjun, and Halbert White. 2007. “A Consistent Characteristic Function-Based Test for Conditional Independence.” Journal of Econometrics 141 (2): 807–34. https://doi.org/10.1016/j.jeconom.2006.11.006.
Székely, Gábor J., and Maria L. Rizzo. 2009. “Brownian Distance Covariance.” The Annals of Applied Statistics 3 (4): 1236–65. https://doi.org/10.1214/09-AOAS312.
Székely, Gábor J., Maria L. Rizzo, and Nail K. Bakirov. 2007. “Measuring and Testing Dependence by Correlation of Distances.” The Annals of Statistics 35 (6): 2769–94. https://doi.org/10.1214/009053607000000505.
Talagrand, Michel. 1996. “A New Look at Independence.” The Annals of Probability 24 (1): 1–34. http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aop/1042644705.
Thanei, Gian-Andrea, Nicolai Meinshausen Shah Rajen D., and Rajen D. Shah. 2016. “The Xyz Algorithm for Fast Interaction Search in High-Dimensional Data.” Arxiv 20 (9): 846–51. https://arxiv.org/abs/1610.05108.
Yao, Shun, Xianyang Zhang, and Xiaofeng Shao. 2016. “Testing Mutual Independence in High Dimension via Distance Covariance,” September. http://arxiv.org/abs/1609.09380.
Zhang, Kun, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. “Kernel-Based Conditional Independence Test and Application in Causal Discovery,” February. http://arxiv.org/abs/1202.3775.
Zhang, Qinyi, Sarah Filippi, Arthur Gretton, and Dino Sejdinovic. 2016. “Large-Scale Kernel Methods for Independence Testing,” June. http://arxiv.org/abs/1606.07892.