Also with the notion of similarity as seen in kernel tricks. What you might do to learn an index. Inducing a differential metric. Matrix factorisations and random features, high-dimensional statistics. Ultimately, this is always (at least implicitly) learning a manifold. A good dimension reduction can produce a nearly sufficient statistic for indirect inference.

## Learning a summary statistic

TBD. See learning summary statistics.

## PCA and cousins

The classic. Kernel PCA, linear algebra and probabilistic formulations. Has a nice probabilistic interpretation “for free” via the Karhunen–Loève theorem.

Matrix factorisations are a mild generalisation here, from rank 1 operators to higher rank operators. 🏗

There are various extensions such as additive component analysis:

We propose Additive Component Analysis (ACA), a novel nonlinear extension of PCA. Inspired by multivariate nonparametric regression with additive models, ACA fits a smooth manifold to data by learning an explicit mapping from a low-dimensional latent space to the input space, which trivially enables applications like denoising.

## UMAP

*Uniform Manifold approximation and projection for dimension reduction* (McInnes, Healy, and Melville 2018).
Apparently super hot right now. (HT James Nichols).
Nikolay Oskolkov’s introduction is neat.
John Baez discusses
the category theoretic underpinning.

## Random projection

## Stochastic neighbour embedding

Probabilistically preserving closeness. The height of this technique is the famous t-SNE, although as far as I understand it has been superseded by UMAP.

## Autoencoder and word2vec

The “nonlinear PCA” interpretation of word2vec, I just heard from Junbin Gao.

\[L(x, x') = \|x-x\|^2=\|x-\sigma(U*\sigma*W^Tx+b)) + b')\|^2\]

TBC.

## For indexing my database

See learnable indexes.

## Locality Preserving projections

Try to preserve the nearness of points if they are connected on some (weight) graph.

\[\sum_{i,j}(y_i-y_j)^2 w_{i,j}\]

So we seen an optimal projection vector.

(requirement for sparse similarity matrix?)

## Multidimensional scaling

TDB.

## Misc

Castro, Pablo de, and Tommaso Dorigo. 2019. “INFERNO: Inference-Aware Neural Optimisation.” *Computer Physics Communications* 244 (November): 170–79. https://doi.org/10.1016/j.cpc.2019.06.007.

Cook, R. Dennis. 2018. “Principal Components, Sufficient Dimension Reduction, and Envelopes.” *Annual Review of Statistics and Its Application* 5 (1): 533–59. https://doi.org/10.1146/annurev-statistics-031017-100257.

Dwibedi, Debidatta, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, and Andrew Zisserman. 2019. “Temporal Cycle-Consistency Learning,” April. https://arxiv.org/abs/1904.07846v1.

Globerson, Amir, and Sam T. Roweis. 2006. “Metric Learning by Collapsing Classes.” In *Advances in Neural Information Processing Systems*, 451–58. NIPS’05. Cambridge, MA, USA: MIT Press. http://papers.nips.cc/paper/2947-metric-learning-by-collapsing-classes.pdf.

Goroshin, Ross, Joan Bruna, Jonathan Tompson, David Eigen, and Yann LeCun. 2014. “Unsupervised Learning of Spatiotemporally Coherent Metrics,” December. http://arxiv.org/abs/1412.6056.

Hadsell, R., S. Chopra, and Y. LeCun. 2006. “Dimensionality Reduction by Learning an Invariant Mapping.” In *2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition*, 2:1735–42. https://doi.org/10.1109/CVPR.2006.100.

Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006. “Reducing the Dimensionality of Data with Neural Networks.” *Science* 313 (5786): 504–7. https://doi.org/10.1126/science.1127647.

Hinton, Geoffrey, and Sam Roweis. 2002. “Stochastic Neighbor Embedding.” In *Proceedings of the 15th International Conference on Neural Information Processing Systems*, 857–64. NIPS’02. Cambridge, MA, USA: MIT Press. http://papers.nips.cc/paper/2276-stochastic-neighbor-embedding.pdf.

Lawrence, Neil. 2005. “Probabilistic Non-Linear Principal Component Analysis with Gaussian Process Latent Variable Models.” *Journal of Machine Learning Research* 6 (Nov): 1783–1816. http://www.jmlr.org/papers/v6/lawrence05a.html.

Lopez-Paz, David, Suvrit Sra, Alex Smola, Zoubin Ghahramani, and Bernhard Schölkopf. 2014. “Randomized Nonlinear Component Analysis,” February. http://arxiv.org/abs/1402.0119.

Maaten, Laurens van der, and Geoffrey Hinton. 2008. “Visualizing Data Using T-SNE.” *Journal of Machine Learning Research* 9 (Nov): 2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html.

McInnes, Leland, John Healy, and James Melville. 2018. “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” December. http://arxiv.org/abs/1802.03426.

Murdock, Calvin, and Fernando De la Torre. 2017. “Additive Component Analysis.” In *Conference on Computer Vision and Pattern Recognition (CVPR)*. http://www.calvinmurdock.com/content/uploads/publications/cvpr2017aca.pdf.

Oymak, Samet, and Joel A. Tropp. 2015. “Universality Laws for Randomized Dimension Reduction, with Applications,” November. http://arxiv.org/abs/1511.09433.

Peluffo-Ordónez, Diego H., John A. Lee, and Michel Verleysen. 2014. “Short Review of Dimensionality Reduction Methods Based on Stochastic Neighbour Embedding.” In *Advances in Self-Organizing Maps and Learning Vector Quantization*, 65–74. Springer. http://link.springer.com/chapter/10.1007/978-3-319-07695-9_6.

Rohe, Karl, and Muzhe Zeng. 2020. “Vintage Factor Analysis with Varimax Performs Statistical Inference,” April. http://arxiv.org/abs/2004.05387.

Salakhutdinov, Ruslan, and Geoff Hinton. 2007. “Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure.” In *PMLR*, 412–19. http://proceedings.mlr.press/v2/salakhutdinov07a.html.

Smola, Alex J., Robert C. Williamson, Sebastian Mika, and Bernhard Schölkopf. 1999. “Regularized Principal Manifolds.” In *Computational Learning Theory*, edited by Paul Fischer and Hans Ulrich Simon, 214–29. Lecture Notes in Computer Science 1572. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/3-540-49097-3_17.

Sohn, Kihyuk, and Honglak Lee. 2012. “Learning Invariant Representations with Local Transformations.” In *Proceedings of the 29th International Conference on Machine Learning (ICML-12)*, 1311–8. http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2012Sohn_659.pdf.

Sorzano, C. O. S., J. Vargas, and A. Pascual Montano. 2014. “A Survey of Dimensionality Reduction Techniques,” March. http://arxiv.org/abs/1403.2877.

Wang, Boyue, Yongli Hu, Junbin Gao, Yanfeng Sun, Haoran Chen, and Baocai Yin. 2017. “Locality Preserving Projections for Grassmann Manifold.” In *PRoceedings of IJCAI, 2017*. http://arxiv.org/abs/1704.08458.

Wasserman, Larry. 2018. “Topological Data Analysis.” *Annual Review of Statistics and Its Application* 5 (1): 501–32. https://doi.org/10.1146/annurev-statistics-031017-100045.

Weinberger, Kilian, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. “Feature Hashing for Large Scale Multitask Learning.” In *Proceedings of the 26th Annual International Conference on Machine Learning*, 1113–20. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553516.