Clustering



Getting a bunch of data points and approximating them (in some sense) by their membership (possibly fuzzy) in some groups, or regions of feature space.

For certain definitions this can be the same thing as non-negative and/or low rank matrix factorisations if you use mixture models, and is only really different in emphasis from dimensionality reduction. If you start with a list of features then think about β€œdistances” between observations you have just implicitly intuced a weighted graph from your hitherto non-graphy data and are now looking at a networks problem.

If you care about clustering as such, spectral clustering feels like a nice entry point, maybe via Chris Ding’s tutorial on spectral clustering.

  • CONCOR induces a cute similarity measure.

  • MCL: Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.

There are many useful tricks in here, e.g. Belkin and Niyogi (2003) shows how to use a graph Laplacian (possibly a contrived or arbitrary one) to construct β€œnatural” Euclidean coordinates for your data, such that nodes that have much traffic between them in the Laplacian representation have a small Euclidean distance (The β€œUrban Traffic Planner Fantasy Transformation”) Quickly gives you a similarity measure on non-Euclidean data. Questions: Under which metrics is it equivalent to multidimensional scaling? Is it worthwhile going the other way and constructing density estimates from induced flow graphs?

Clustering as matrix factorisation

If I know me, I might be looking at this page trying remember which papers situate k-means-type clustering in matrix factorisation literature.

The single-serve paper doing that is Bauckhage (2015), but there are broader versions (Singh and Gordon 2008; TΓΌrkmen 2015), some computer science connections in Mixon, Villar, and Ward (2016), and an older one in Zass and Shashua (2005).

Further things I might discuss here are the graph-flow/Laplacian notions of clustering and the density/centroids approach. I will discuss that under mixture models

References

Auvolat, Alex, and Pascal Vincent. 2015. β€œClustering Is Efficient for Approximate Maximum Inner Product Search.” arXiv:1507.05910 [Cs, Stat], July.
Bach, Francis R., and Michael I. Jordan. 2006. β€œLearning Spectral Clustering, with Application to Speech Separation.” Journal of Machine Learning Research 7 (Oct): 1963–2001.
Batson, Joshua, Daniel A. Spielman, and Nikhil Srivastava. 2008. β€œTwice-Ramanujan Sparsifiers.” arXiv:0808.0163 [Cs], August.
Bauckhage, Christian. 2015. β€œK-Means Clustering Is Matrix Factorization.” arXiv:1512.07548 [Stat], December.
Belkin, Mikhail, and Partha Niyogi. 2003. β€œLaplacian Eigenmaps for Dimensionality Reduction and Data Representation.” Neural Computation 15 (6): 1373–96.
Clauset, Aaron. 2005. β€œFinding Local Community Structure in Networks.”
Clauset, Aaron, Mark E J Newman, and Cristopher Moore. 2004. β€œFinding Community Structure in Very Large Networks.” Physical Review E 70 (6): 066111.
Ding, C., X. He, and H. Simon. 2005. β€œOn the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering.” In Proceedings of the 2005 SIAM International Conference on Data Mining, 606–10. Proceedings. Society for Industrial and Applied Mathematics.
Donoho, David L., and Carrie Grimes. 2003. β€œHessian Eigenmaps: Locally Linear Embedding Techniques for High-Dimensional Data.” Proceedings of the National Academy of Sciences 100 (10): 5591–96.
Dueck, Delbert, Quaid D. Morris, and Brendan J. Frey. 2005. β€œMulti-Way Clustering of Microarray Data Using Probabilistic Sparse Matrix Factorization.” Bioinformatics 21 (suppl 1): i144–51.
Elhamifar, E., and R. Vidal. 2013. β€œSparse Subspace Clustering: Algorithm, Theory, and Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (11): 2765–81.
Fung, Wai Shing, Ramesh Hariharan, Nicholas J.A. Harvey, and Debmalya Panigrahi. 2011. β€œA General Framework for Graph Sparsification.” In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing, 71–80. STOC ’11. New York, NY, USA: ACM.
Hallac, David, Jure Leskovec, and Stephen Boyd. 2015. β€œNetwork Lasso: Clustering and Optimization in Large Graphs.” arXiv:1507.00280 [Cs, Math, Stat], July.
He, Xiaofei, and Partha Niyogi. 2003. β€œLocality Preserving Projections.” In Proceedings of the 16th International Conference on Neural Information Processing Systems, 16:153–60. NIPS’03. Cambridge, MA, USA: MIT Press.
Huang, G., M. Kaess, and J. J. Leonard. 2013. β€œConsistent Sparsification for Graph Optimization.” In 2013 European Conference on Mobile Robots (ECMR), 150–57.
Keogh, Eamonn, and Jessica Lin. 2004. β€œClustering of Time-Series Subsequences Is Meaningless: Implications for Previous and Future Research.” Knowledge and Information Systems 8 (2): 154–77.
Luxburg, Ulrike von. 2007. β€œA Tutorial on Spectral Clustering.”
Masuda, Naoki, Mason A. Porter, and Renaud Lambiotte. 2016. β€œRandom Walks and Diffusion on Networks.” arXiv:1612.03281 [Cond-Mat, Physics:physics], December.
Mixon, Dustin G., Soledad Villar, and Rachel Ward. 2016. β€œClustering Subgaussian Mixtures by Semidefinite Programming.” arXiv:1602.06612 [Cs, Math, Stat], February.
Mohler, George. 2013. β€œModeling and Estimation of Multi-Source Clustering in Crime and Security Data.” The Annals of Applied Statistics 7 (3): 1525–39.
Newman, Mark E J. 2004. β€œDetecting Community Structure in Networks.” The European Physical Journal B - Condensed Matter and Complex Systems 38 (2): 321–30.
Peng, J., and Y. Wei. 2007. β€œApproximating K‐means‐type Clustering via Semidefinite Programming.” SIAM Journal on Optimization 18 (1): 186–205.
Pourkamali-Anaraki, Farhad, and Stephen Becker. 2016a. β€œA Randomized Approach to Efficient Kernel Clustering.” arXiv:1608.07597 [Stat], August.
β€”β€”β€”. 2016b. β€œRandomized Clustered Nystrom for Large-Scale Kernel Machines.” arXiv:1612.06470 [Cs, Stat], December.
Schaeffer, S E. 2007. β€œGraph Clustering.” Computer Science Review 1 (1): 27–64.
SchΓΆlkopf, Bernhard, Phil Knirsch, Alex Smola, and Chris Burges. 1998. β€œFast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces.” In Mustererkennung 1998, edited by Paul Levi, Michael Schanz, Rolf-JΓΌrgen Ahlers, and Franz May, 125–32. Informatik Aktuell. Springer Berlin Heidelberg.
Shamir, Ohad, and Naftali Tishby. 2011. β€œSpectral Clustering on a Budget.” Journal of Machine Learning Research.
Singh, Ajit P., and Geoffrey J. Gordon. 2008. β€œA Unified View of Matrix Factorization Models.” In Machine Learning and Knowledge Discovery in Databases, 358–73. Springer.
Slonim, Noam, Gurinder S Atwal, GaΕ‘per Tkačik, and William Bialek. 2005. β€œInformation-Based Clustering.” Proceedings of the National Academy of Sciences of the United States of America 102: 18297–302.
Spielman, Daniel A., and Shang-Hua Teng. 2004. β€œNearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems.” In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, 81–90. STOC ’04. New York, NY, USA: ACM.
β€”β€”β€”. 2008. β€œA Local Clustering Algorithm for Massive Graphs and Its Application to Nearly-Linear Time Graph Partitioning.” arXiv:0809.3232 [Cs], September.
Spielman, D., and N. Srivastava. 2011. β€œGraph Sparsification by Effective Resistances.” SIAM Journal on Computing 40 (6): 1913–26.
Steyvers, Mark, and Joshua B. Tenenbaum. 2005. β€œThe Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth.” Cognitive Science 29 (1): 41–78.
Tschannen, Michael, and Helmut BΓΆlcskei. 2016. β€œNoisy Subspace Clustering via Matching Pursuits.” arXiv:1612.03450 [Cs, Math, Stat], December.
TΓΌrkmen, Ali Caner. 2015. β€œA Review of Nonnegative Matrix Factorization Methods for Clustering.” arXiv:1507.03194 [Cs, Stat], July.
Yan, Donghui, Ling Huang, and Michael I. Jordan. 2009. β€œFast Approximate Spectral Clustering.” In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 907–16. KDD ’09. New York, NY, USA: ACM.
Zass, Ron, and Amnon Shashua. 2005. β€œA Unifying Approach to Hard and Probabilistic Clustering.” In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1 - Volume 01, 294–301. ICCV ’05. Washington, DC, USA: IEEE Computer Society.
Zhang, Zhongyuan, Chris Ding, Tao Li, and Xiangsun Zhang. 2007. β€œBinary Matrix Factorization with Applications.” In Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007, 391–400. IEEE.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.