Clustering

May 22, 2015 — June 7, 2016

Bayes
clustering
convolution
density
feature construction
networks
nonparametric
probability
sparser than thou
statistics
Figure 1

Getting a bunch of data points and approximating them (in some sense) by their membership (possibly fuzzy) in some groups or regions of feature space. Quantization, in other words.

For certain definitions, this can be the same thing as non-negative and/or low-rank matrix factorisations if you use mixture models, and is only really different in emphasis from dimensionality reduction. If you start with a list of features and then think about “distances” between observations, you have just implicitly intuited a weighted graph from your hitherto non-graphy data and are now looking at a networks problem.

If you care about clustering as such, spectral clustering feels like a nice entry point, maybe via Chris Ding’s tutorial on spectral clustering.

There are many useful tricks in here, e.g. Belkin and Niyogi () shows how to use a graph Laplacian (possibly a contrived or arbitrary one) to construct “natural” Euclidean coordinates for your data, such that nodes that have much traffic between them in the Laplacian representation have a small Euclidean distance (The “Urban Traffic Planner Fantasy Transformation”) Quickly gives you a similarity measure on non-Euclidean data. Questions: Under which metrics is it equivalent to multidimensional scaling? Is it worthwhile going the other way and constructing density estimates from induced flow graphs?

1 Clustering as matrix factorization

If I know me, I might be looking at this page trying to remember which papers situate k-means-type clustering in matrix factorization literature.

The single-serve paper doing that is Bauckhage (), but there are broader versions (; ), some computer science connections in Mixon, Villar, and Ward (), and an older one in Zass and Shashua ().

Further things I might discuss here are the graph-flow/Laplacian notions of clustering and the density/centroids approach. I will discuss that under mixture models

2 References

Auvolat, and Vincent. 2015. Clustering Is Efficient for Approximate Maximum Inner Product Search.” arXiv:1507.05910 [Cs, Stat].
Bach, and Jordan. 2006. Learning Spectral Clustering, with Application to Speech Separation.” Journal of Machine Learning Research.
Batson, Spielman, and Srivastava. 2008. Twice-Ramanujan Sparsifiers.” arXiv:0808.0163 [Cs].
Bauckhage. 2015. K-Means Clustering Is Matrix Factorization.” arXiv:1512.07548 [Stat].
Belkin, and Niyogi. 2003. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation.” Neural Computation.
Clauset. 2005. “Finding Local Community Structure in Networks.”
Clauset, Newman, and Moore. 2004. Finding Community Structure in Very Large Networks.” Physical Review E.
Ding, He, and Simon. 2005. On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering.” In Proceedings of the 2005 SIAM International Conference on Data Mining. Proceedings.
Donoho, and Grimes. 2003. Hessian Eigenmaps: Locally Linear Embedding Techniques for High-Dimensional Data.” Proceedings of the National Academy of Sciences.
Dueck, Morris, and Frey. 2005. Multi-Way Clustering of Microarray Data Using Probabilistic Sparse Matrix Factorization.” Bioinformatics.
Elhamifar, and Vidal. 2013. Sparse Subspace Clustering: Algorithm, Theory, and Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Fung, Hariharan, Harvey, et al. 2011. A General Framework for Graph Sparsification.” In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing. STOC ’11.
Hallac, Leskovec, and Boyd. 2015. Network Lasso: Clustering and Optimization in Large Graphs.” arXiv:1507.00280 [Cs, Math, Stat].
He, and Niyogi. 2003. Locality Preserving Projections.” In Proceedings of the 16th International Conference on Neural Information Processing Systems. NIPS’03.
Huang, Kaess, and Leonard. 2013. Consistent Sparsification for Graph Optimization.” In 2013 European Conference on Mobile Robots (ECMR).
Keogh, and Lin. 2004. Clustering of Time-Series Subsequences Is Meaningless: Implications for Previous and Future Research.” Knowledge and Information Systems.
Luxburg. 2007. “A Tutorial on Spectral Clustering.”
Masuda, Porter, and Lambiotte. 2016. Random Walks and Diffusion on Networks.” arXiv:1612.03281 [Cond-Mat, Physics:physics].
Mixon, Villar, and Ward. 2016. Clustering Subgaussian Mixtures by Semidefinite Programming.” arXiv:1602.06612 [Cs, Math, Stat].
Mohler. 2013. Modeling and Estimation of Multi-Source Clustering in Crime and Security Data.” The Annals of Applied Statistics.
Newman. 2004. Detecting Community Structure in Networks.” The European Physical Journal B - Condensed Matter and Complex Systems.
Peng, and Wei. 2007. Approximating K‐means‐type Clustering via Semidefinite Programming.” SIAM Journal on Optimization.
Pourkamali-Anaraki, and Becker. 2016a. A Randomized Approach to Efficient Kernel Clustering.” arXiv:1608.07597 [Stat].
———. 2016b. Randomized Clustered Nystrom for Large-Scale Kernel Machines.” arXiv:1612.06470 [Cs, Stat].
Schaeffer. 2007. Graph Clustering.” Computer Science Review.
Schölkopf, Knirsch, Smola, et al. 1998. Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces.” In Mustererkennung 1998. Informatik Aktuell.
Shamir, and Tishby. 2011. Spectral Clustering on a Budget.” Journal of Machine Learning Research.
Singh, and Gordon. 2008. A Unified View of Matrix Factorization Models.” In Machine Learning and Knowledge Discovery in Databases.
Slonim, Atwal, Tkačik, et al. 2005. Information-Based Clustering.” Proceedings of the National Academy of Sciences of the United States of America.
Spielman, D., and Srivastava. 2011. Graph Sparsification by Effective Resistances.” SIAM Journal on Computing.
Spielman, Daniel A., and Teng. 2004. Nearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems.” In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing. STOC ’04.
———. 2008. A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly-Linear Time Graph Partitioning.” arXiv:0809.3232 [Cs].
Steyvers, and Tenenbaum. 2005. The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth.” Cognitive Science.
Tschannen, and Bölcskei. 2016. Noisy Subspace Clustering via Matching Pursuits.” arXiv:1612.03450 [Cs, Math, Stat].
Türkmen. 2015. A Review of Nonnegative Matrix Factorization Methods for Clustering.” arXiv:1507.03194 [Cs, Stat].
Yan, Huang, and Jordan. 2009. Fast Approximate Spectral Clustering.” In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’09.
Zass, and Shashua. 2005. A Unifying Approach to Hard and Probabilistic Clustering.” In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1 - Volume 01. ICCV ’05.
Zhang, Ding, Li, et al. 2007. Binary Matrix Factorization with Applications.” In Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007.