Dimensionality reduction

Wherein I teach myself, amongst other things, how a sparse PCA works, and decide where to file multidimensional scaling

Also with the notion of similarity as seen in kernel tricks. What you might do to learn an index. Inducing a differential metric. Matrix factorisations and random features, high-dimensional statistics. Ultimately, this is always (at least implicitly) learning a manifold.

PCA and cousins

The classic. Kernel PCA, linear algebra and probabilistic formulations. Has a nice probabilistic interpretation “for free” via the Karhunen–Loève theorem.

Matrix factorisations are a mild generalisation here, from rank 1 operators to higher rank operators. 🏗

There are various extensions such as additive component analysis:

We propose Additive Component Analysis (ACA), a novel nonlinear extension of PCA. Inspired by multivariate nonparametric regression with additive models, ACA fits a smooth manifold to data by learning an explicit mapping from a low-dimensional latent space to the input space, which trivially enables applications like denoising.


Uniform Manifold approximation and projection for dimension reduction (McInnes, Healy, and Melville 2018). Apparently super hot right now. (HT James Nichols). Nikolay Oskolkov’s introduction is neat. John Baez discusses the category theoretic underpinning.

Random projection

See random embeddings

Stochastic neighbour embedding

Probabilistically preserving closeness. The height of this technique is the famous t-SNE, although as far as I understand it has been superseded by UMAP.

Autoencoder and word2vec

The “nonlinear PCA” interpretation of word2vec, I just heard from Junbin Gao.

\[L(x, x') = \|x-x\|^2=\|x-\sigma(U*\sigma*W^Tx+b)) + b')\|^2\]


For indexing my database

See Learnable indexes.

Locality Preserving projections

Try to preserve the nearness of points if they are connected on some (weight) graph.

\[\sum_{i,j}(y_i-y_j)^2 w_{i,j}\]

So we seen an optimal projection vector.

(requirement for sparse similarity matrix?)

Multidimensional scaling



Cook, R. Dennis. 2018. “Principal Components, Sufficient Dimension Reduction, and Envelopes.” Annual Review of Statistics and Its Application 5 (1): 533–59. https://doi.org/10.1146/annurev-statistics-031017-100257.

Dwibedi, Debidatta, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, and Andrew Zisserman. 2019. “Temporal Cycle-Consistency Learning,” April. https://arxiv.org/abs/1904.07846v1.

Globerson, Amir, and Sam T. Roweis. 2006. “Metric Learning by Collapsing Classes.” In Advances in Neural Information Processing Systems, 451–58. NIPS’05. Cambridge, MA, USA: MIT Press. http://papers.nips.cc/paper/2947-metric-learning-by-collapsing-classes.pdf.

Goroshin, Ross, Joan Bruna, Jonathan Tompson, David Eigen, and Yann LeCun. 2014. “Unsupervised Learning of Spatiotemporally Coherent Metrics,” December. http://arxiv.org/abs/1412.6056.

Hadsell, R., S. Chopra, and Y. LeCun. 2006. “Dimensionality Reduction by Learning an Invariant Mapping.” In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2:1735–42. https://doi.org/10.1109/CVPR.2006.100.

Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006. “Reducing the Dimensionality of Data with Neural Networks.” Science 313 (5786): 504–7. https://doi.org/10.1126/science.1127647.

Hinton, Geoffrey, and Sam Roweis. 2002. “Stochastic Neighbor Embedding.” In Proceedings of the 15th International Conference on Neural Information Processing Systems, 857–64. NIPS’02. Cambridge, MA, USA: MIT Press. http://papers.nips.cc/paper/2276-stochastic-neighbor-embedding.pdf.

Lawrence, Neil. 2005. “Probabilistic Non-Linear Principal Component Analysis with Gaussian Process Latent Variable Models.” Journal of Machine Learning Research 6 (Nov): 1783–1816. http://www.jmlr.org/papers/v6/lawrence05a.html.

Lopez-Paz, David, Suvrit Sra, Alex Smola, Zoubin Ghahramani, and Bernhard Schölkopf. 2014. “Randomized Nonlinear Component Analysis,” February. http://arxiv.org/abs/1402.0119.

Maaten, Laurens van der, and Geoffrey Hinton. 2008. “Visualizing Data Using T-SNE.” Journal of Machine Learning Research 9 (Nov): 2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html.

McInnes, Leland, John Healy, and James Melville. 2018. “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” December. http://arxiv.org/abs/1802.03426.

Murdock, Calvin, and Fernando De la Torre. 2017. “Additive Component Analysis.” In Conference on Computer Vision and Pattern Recognition (CVPR). http://www.calvinmurdock.com/content/uploads/publications/cvpr2017aca.pdf.

Oymak, Samet, and Joel A. Tropp. 2015. “Universality Laws for Randomized Dimension Reduction, with Applications,” November. http://arxiv.org/abs/1511.09433.

Peluffo-Ordónez, Diego H., John A. Lee, and Michel Verleysen. 2014. “Short Review of Dimensionality Reduction Methods Based on Stochastic Neighbour Embedding.” In Advances in Self-Organizing Maps and Learning Vector Quantization, 65–74. Springer. http://link.springer.com/chapter/10.1007/978-3-319-07695-9_6.

Salakhutdinov, Ruslan, and Geoff Hinton. 2007. “Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure.” In PMLR, 412–19. http://proceedings.mlr.press/v2/salakhutdinov07a.html.

Smola, Alex J., Robert C. Williamson, Sebastian Mika, and Bernhard Schölkopf. 1999. “Regularized Principal Manifolds.” In Computational Learning Theory, edited by Paul Fischer and Hans Ulrich Simon, 214–29. Lecture Notes in Computer Science 1572. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/3-540-49097-3_17.

Sohn, Kihyuk, and Honglak Lee. 2012. “Learning Invariant Representations with Local Transformations.” In Proceedings of the 29th International Conference on Machine Learning (ICML-12), 1311–8. http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2012Sohn_659.pdf.

Sorzano, C. O. S., J. Vargas, and A. Pascual Montano. 2014. “A Survey of Dimensionality Reduction Techniques,” March. http://arxiv.org/abs/1403.2877.

Wang, Boyue, Yongli Hu, Junbin Gao, Yanfeng Sun, Haoran Chen, and Baocai Yin. 2017. “Locality Preserving Projections for Grassmann Manifold.” In PRoceedings of IJCAI, 2017. http://arxiv.org/abs/1704.08458.

Wasserman, Larry. 2018. “Topological Data Analysis.” Annual Review of Statistics and Its Application 5 (1): 501–32. https://doi.org/10.1146/annurev-statistics-031017-100045.

Weinberger, Kilian, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. “Feature Hashing for Large Scale Multitask Learning.” In Proceedings of the 26th Annual International Conference on Machine Learning, 1113–20. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553516.