# Dimensionality reduction

## Wherein I teach myself, amongst other things, feature selection, how a sparse PCA works, and decide where to file multidimensional scaling

🏗🏗🏗🏗🏗

I will restructure learning on manifolds and dimensionality reduction into a more useful distinction.

You have lots of predictors in your regression model! Too many predictors You want less predictors! Maybe then it would be faster, or at least more compact. Can you throw some out, or summarise them in some sens? Also with the notion of similarity as seen in kernel tricks. What you might do to learn an index. Inducing a differential metric. Matrix factorisations and random features, random projections, high-dimensional statistics. Ultimately, this is always (at least implicitly) learning a manifold. A good dimension reduction can produce a nearly sufficient statistic for indirect inference.

## Bayes

Throwing out data in a classical Bayes context is a subtle matter, but it can be done. See Bayesian model selection.

## Learning a summary statistic

See learning summary statistics. As seen in approximate Bayes. Note this is not at all the same thing as discarding predictors; rather it is about learning a useful statistic to make inferences over some more intractable ones.

## Feature selection

Deciding whether to include or discard predictors. This one is old and has been included in regression models for a long time. Model selection is a classic one, and regularised sparse model selection is the surprisingly effective recent evolution. But it continues! FOCI is an application of an interesting new independence test that is very much en vogue despite being in an area that we all thought was thoroughly mined out.

## PCA and cousins

The classic. Kernel PCA, linear algebra and probabilistic formulations. Has a nice probabilistic interpretation “for free” via the Karhunen-Loève theorem.

Matrix factorisations are a generalisation here, from rank 1 operators to higher rank operators. 🏗

There are various extensions such as additive component analysis:

We propose Additive Component Analysis (ACA), a novel nonlinear extension of PCA. Inspired by multivariate nonparametric regression with additive models, ACA fits a smooth manifold to data by learning an explicit mapping from a low-dimensional latent space to the input space, which trivially enables applications like denoising.

## Learning a distance metric

A related notion is to learn a simpler way of quantifying, in some sense, how similar are two datapoints. This usually involves learning an embedding in some low dimensional ambient space as a by-product.

### UMAP

Uniform Manifold approximation and projection for dimension reduction . Apparently super hot right now. (HT James Nichols). Nikolay Oskolkov’s introduction is neat. John Baez discusses the category theoretic underpinning.

### Locality Preserving projections

Try to preserve the nearness of points if they are connected on some (weight) graph.

$\sum_{i,j}(y_i-y_j)^2 w_{i,j}$

So we seen an optimal projection vector.

(requirement for sparse similarity matrix?)

### Diffusion maps

This manifold-learning technique seemed fashionable for a while.

Mikhail Belkin connects this to the graph laplacian literature.

### As manifold learning

Same thing, with some different emphases and history, over at manifold learning.

TDB.

### Stochastic neighbour embedding and other visualisation-oriented methods

These methods are designed to make high-dimensional data sets look comprehensible in low-dimensional representation.

Probabilistically preserving closeness. The height of this technique is the famous t-SNE, although as far as I understand it has been superseded by UMAP.

Instead of reducing and visualising higher dimensional data with t-SNE or PCA, here are three relatively recent non-linear dimension reduction techniques that are designed for visualising high dimensional data in 2D or 3D:

Trimap and LargeVis are learned mappings that I would expect to be more representative of the original data than what t-SNE provides. UMAP assumes connectedness of the manifold so it’s probably less suitable for data that contains distinct clusters but otherwise still a great option.

## Autoencoder and word2vec

The “nonlinear PCA” interpretation of word2vec, I just heard from Junbin Gao.

$L(x, x') = \|x-x\|^2=\|x-\sigma(U*\sigma*W^Tx+b)) + b')\|^2$

TBC.

## References

Azadkia, Mona, and Sourav Chatterjee. 2019. arXiv:1910.12327 [Cs, Math, Stat], December.
Bach, Francis R, and Michael I Jordan. 2002. “Kernel Independent Component Analysis.” Journal of Machine Learning Research 3 (July): 48.
Castro, Pablo de, and Tommaso Dorigo. 2019. Computer Physics Communications 244 (November): 170–79.
Charpentier, Arthur, Stéphane Mussard, and Téa Ouraga. 2021. European Journal of Operational Research, February.
Coifman, R. R., S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. 2005a. Proceedings of the National Academy of Sciences 102 (21): 7426–31.
———. 2005b. Proceedings of the National Academy of Sciences 102 (21): 7432–37.
Coifman, Ronald R., and Stéphane Lafon. 2006. Applied and Computational Harmonic Analysis, Special Issue: Diffusion Maps and Wavelets, 21 (1): 5–30.
Cook, R. Dennis. 2018. Annual Review of Statistics and Its Application 5 (1): 533–59.
Dwibedi, Debidatta, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, and Andrew Zisserman. 2019. April.
Globerson, Amir, and Sam T. Roweis. 2006. In Advances in Neural Information Processing Systems, 451–58. NIPS’05. Cambridge, MA, USA: MIT Press.
Goroshin, Ross, Joan Bruna, Jonathan Tompson, David Eigen, and Yann LeCun. 2014. arXiv:1412.6056 [Cs], December.
Hadsell, R., S. Chopra, and Y. LeCun. 2006. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2:1735–42.
Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006. Science 313 (5786): 504–7.
Hinton, Geoffrey, and Sam Roweis. 2002. In Proceedings of the 15th International Conference on Neural Information Processing Systems, 857–64. NIPS’02. Cambridge, MA, USA: MIT Press.
Hyvärinen, A, and E Oja. 2000. Neural Networks 13 (4?5): 411–30.
Kim, Cheolmin, and Diego Klabjan. 2020. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (8): 1842–55.
Lawrence, Neil. 2005. Journal of Machine Learning Research 6 (Nov): 1783–1816.
Lopez-Paz, David, Suvrit Sra, Alex Smola, Zoubin Ghahramani, and Bernhard Schölkopf. 2014. arXiv:1402.0119 [Cs, Stat], February.
Maaten, Laurens van der, and Geoffrey Hinton. 2008. Journal of Machine Learning Research 9 (Nov): 2579–2605.
McInnes, Leland, John Healy, and James Melville. 2018. arXiv:1802.03426 [Cs, Stat], December.
Murdock, Calvin, and Fernando De la Torre. 2017. In Conference on Computer Vision and Pattern Recognition (CVPR).
Oymak, Samet, and Joel A. Tropp. 2015. arXiv:1511.09433 [Cs, Math, Stat], November.
Peluffo-Ordónez, Diego H., John A. Lee, and Michel Verleysen. 2014. In Advances in Self-Organizing Maps and Learning Vector Quantization, 65–74. Springer.
Rohe, Karl, and Muzhe Zeng. 2020. arXiv:2004.05387 [Math, Stat], April.
Salakhutdinov, Ruslan, and Geoff Hinton. 2007. In PMLR, 412–19.
Smola, Alex J., Robert C. Williamson, Sebastian Mika, and Bernhard Schölkopf. 1999. In Computational Learning Theory, edited by Paul Fischer and Hans Ulrich Simon, 214–29. Lecture Notes in Computer Science 1572. Springer Berlin Heidelberg.
Sohn, Kihyuk, and Honglak Lee. 2012. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), 1311–18.
Sorzano, C. O. S., J. Vargas, and A. Pascual Montano. 2014. arXiv:1403.2877 [Cs, q-Bio, Stat], March.
Wang, Boyue, Yongli Hu, Junbin Gao, Yanfeng Sun, Haoran Chen, and Baocai Yin. 2017. In PRoceedings of IJCAI, 2017.
Wasserman, Larry. 2018. Annual Review of Statistics and Its Application 5 (1): 501–32.
Weinberger, Kilian, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. In Proceedings of the 26th Annual International Conference on Machine Learning, 1113–20. ICML ’09. New York, NY, USA: ACM.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.