Random embeddings and hashing

See also matrix factorisations, and discuss random projections and their role in motivating compressed sensing etc.

Cover’s Theorem (Cover 1965):

It was shown that, for a random set of linear inequalities in \(d\) unknowns, the expected number of extreme inequalities, which are necessary and sufficient to imply the entire set, tends to \(2d\) as the number of consistent inequalities tends to infinity, thus bounding the expected necessary storage capacity for linear decision algorithms in separable problems. The results, even those dealing with randomly positioned points, have been combinatorial in nature, and have been essentially independent of the configuration of the set of points in the space.

I am especially interested in random embeddings for kernel approximation.

Over at compressed sensing we mention some other random projection results, such as the Johnson-Lindenstrauss lemma, and these ideas are closely related, in the probabilistic setting, to concentration inequalities.

Landweber, Lazar, and Patel (2016) have an example of a converse result about your continuous random embeddings. (🏗)

Achlioptas, Dimitris. 2003. “Database-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins.” Journal of Computer and System Sciences, Special Issue on PODS 2001, 66 (4): 671–87. https://doi.org/10.1016/S0022-0000(03)00025-4.

Ailon, Nir, and Bernard Chazelle. 2009. “The Fast Johnson–Lindenstrauss Transform and Approximate Nearest Neighbors.” SIAM Journal on Computing 39 (1): 302–22. https://doi.org/10.1137/060673096.

Alaoui, Ahmed El, and Michael W. Mahoney. 2014. “Fast Randomized Kernel Methods with Statistical Guarantees,” November. http://arxiv.org/abs/1411.0306.

Andoni, A., and P. Indyk. 2006. “Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions.” In 47th Annual IEEE Symposium on Foundations of Computer Science, 2006. FOCS ’06, 51:459–68. https://doi.org/10.1109/FOCS.2006.49.

Andoni, Alexandr, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. 2013. “Beyond Locality-Sensitive Hashing,” June. http://arxiv.org/abs/1306.1547.

Andoni, Alexandr, and Ilya Razenshteyn. 2015. “Optimal Data-Dependent Hashing for Approximate Near Neighbors,” January. http://arxiv.org/abs/1501.01062.

Auvolat, Alex, and Pascal Vincent. 2015. “Clustering Is Efficient for Approximate Maximum Inner Product Search,” July. http://arxiv.org/abs/1507.05910.

Bach, Francis. 2015. “On the Equivalence Between Kernel Quadrature Rules and Random Feature Expansions.” arXiv Preprint arXiv:1502.06800. http://arxiv.org/abs/1502.06800.

Baraniuk, Richard, Mark Davenport, Ronald DeVore, and Michael Wakin. 2008. “A Simple Proof of the Restricted Isometry Property for Random Matrices.” Constructive Approximation 28 (3): 253–63. https://doi.org/10.1007/s00365-007-9003-x.

Bingham, Ella, and Heikki Mannila. 2001. “Random Projection in Dimensionality Reduction: Applications to Image and Text Data.” In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 245–50. KDD ’01. New York, NY, USA: ACM. https://doi.org/10.1145/502512.502546.

Brault, Romain, Florence d’Alché-Buc, and Markus Heinonen. 2016. “Random Fourier Features for Operator-Valued Kernels.” In Proceedings of the 8th Asian Conference on Machine Learning, 110–25. http://arxiv.org/abs/1605.02536.

Candès, Emmanuel J., and Terence Tao. 2006. “Near-Optimal Signal Recovery from Random Projections: Universal Encoding Strategies?” IEEE Transactions on Information Theory 52 (12): 5406–25. https://doi.org/10.1109/TIT.2006.885507.

Casey, M., C. Rhodes, and M. Slaney. 2008. “Analysis of Minimum Distances in High-Dimensional Musical Spaces.” IEEE Transactions on Audio, Speech, and Language Processing 16 (5): 1015–28. https://doi.org/10.1109/TASL.2008.925883.

Choromanski, Krzysztof, Mark Rowland, and Adrian Weller. 2017. “The Unreasonable Effectiveness of Random Orthogonal Embeddings,” March. http://arxiv.org/abs/1703.00864.

Cover, T. M. 1965. “Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition.” IEEE Transactions on Electronic Computers EC-14 (3): 326–34. https://doi.org/10.1109/PGEC.1965.264137.

Dasgupta, Sanjoy. 2000. “Experiments with Random Projection.” In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, 143–51. UAI’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. http://arxiv.org/abs/1301.3849.

Dasgupta, Sanjoy, and Anupam Gupta. 2003. “An Elementary Proof of a Theorem of Johnson and Lindenstrauss.” Random Structures & Algorithms 22 (1): 60–65. https://doi.org/10.1002/rsa.10073.

Datar, Mayur, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. “Locality-Sensitive Hashing Scheme Based on P-Stable Distributions.” In Proceedings of the Twentieth Annual Symposium on Computational Geometry, 253–62. SCG ’04. New York, NY, USA: ACM. https://doi.org/10.1145/997817.997857.

Dezfouli, Amir, and Edwin V. Bonilla. 2015. “Scalable Inference for Gaussian Process Models with Black-Box Likelihoods.” In Advances in Neural Information Processing Systems 28, 1414–22. NIPS’15. Cambridge, MA, USA: MIT Press. http://dl.acm.org/citation.cfm?id=2969239.2969397.

Duarte, Marco F., and Richard G. Baraniuk. 2013. “Spectral Compressive Sensing.” Applied and Computational Harmonic Analysis 35 (1): 111–29. https://doi.org/10.1016/j.acha.2012.08.003.

Eftekhari, Armin, Han Lun Yap, Michael B. Wakin, and Christopher J. Rozell. 2016. “Stabilizing Embedology: Geometry-Preserving Delay-Coordinate Maps,” September. http://arxiv.org/abs/1609.06347.

Fodor, Imola. 2002. “A Survey of Dimension Reduction Techniques.” https://e-reports-ext.llnl.gov/pdf/240921.pdf.

Freund, Yoav, Sanjoy Dasgupta, Mayank Kabra, and Nakul Verma. 2007. “Learning the Structure of Manifolds Using Random Projections.” In Advances in Neural Information Processing Systems, 473–80. http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2007_133.pdf.

Geurts, Pierre, Damien Ernst, and Louis Wehenkel. 2006. “Extremely Randomized Trees.” Machine Learning 63 (1): 3–42. https://doi.org/10.1007/s10994-006-6226-1.

Gionis, Aristides, Piotr Indyky, and Rajeev Motwaniz. 1999. “Similarity Search in High Dimensions via Hashing.” In. http://www.cs.princeton.edu/courses/archive/spring13/cos598C/Gionis.pdf.

Giryes, R., G. Sapiro, and A. M. Bronstein. 2016. “Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?” IEEE Transactions on Signal Processing 64 (13): 3444–57. https://doi.org/10.1109/TSP.2016.2546221.

Gorban, Alexander N., Ivan Yu Tyukin, and Ilya Romanenko. 2016. “The Blessing of Dimensionality: Separation Theorems in the Thermodynamic Limit,” October. http://arxiv.org/abs/1610.00494.

Hall, Peter, and Ker-Chau Li. 1993. “On Almost Linearity of Low Dimensional Projections from High Dimensional Data.” The Annals of Statistics 21 (2): 867–89. http://www.jstor.org/stable/2242265.

Heusser, Andrew C., Kirsten Ziman, Lucy L. W. Owen, and Jeremy R. Manning. 2017. “HyperTools: A Python Toolbox for Visualizing and Manipulating High-Dimensional Data,” January. http://arxiv.org/abs/1701.08290.

Kane, Daniel M., and Jelani Nelson. 2014. “Sparser Johnson-Lindenstrauss Transforms.” Journal of the ACM 61 (1): 1–23. https://doi.org/10.1145/2559902.

Koppel, Alec, Garrett Warnell, Ethan Stump, and Alejandro Ribeiro. 2016. “Parsimonious Online Learning with Kernels via Sparse Projections in Function Space,” December. http://arxiv.org/abs/1612.04111.

Krummenacher, Gabriel, Brian McWilliams, Yannic Kilcher, Joachim M Buhmann, and Nicolai Meinshausen. 2016. “Scalable Adaptive Stochastic Optimization Using Random Projections.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1750–8. Curran Associates, Inc. http://papers.nips.cc/paper/6054-scalable-adaptive-stochastic-optimization-using-random-projections.pdf.

Landweber, Peter S., Emanuel A. Lazar, and Neel Patel. 2016. “On Fiber Diameters of Continuous Maps.” American Mathematical Monthly 123 (4): 392–97. https://doi.org/10.4169/amer.math.monthly.123.4.392.

Li, Ping, Trevor J. Hastie, and Kenneth W. Church. 2006. “Very Sparse Random Projections.” In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 287–96. KDD ’06. New York, NY, USA: ACM. https://doi.org/10.1145/1150402.1150436.

McWilliams, Brian, David Balduzzi, and Joachim M Buhmann. 2013. “Correlated Random Features for Fast Semi-Supervised Learning.” In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 1050:440–48. Curran Associates, Inc. http://papers.nips.cc/paper/5000-correlated-random-features-for-fast-semi-supervised-learning.pdf.

Moosmann, Frank, Bill Triggs, and Frederic Jurie. 2006. “Fast Discriminative Visual Codebooks Using Randomized Clustering Forests.” In Advances in Neural Information Processing Systems, 985–92. http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_741.pdf.

Oveneke, Meshia Cédric, Mitchel Aliosha-Perez, Yong Zhao, Dongmei Jiang, and Hichem Sahli. 2016. “Efficient Convolutional Auto-Encoding via Random Convexification and Frequency-Domain Minimization.” In Advances in Neural Information Processing Systems 29. http://arxiv.org/abs/1611.09232.

Oymak, Samet, and Joel A. Tropp. 2015. “Universality Laws for Randomized Dimension Reduction, with Applications,” November. http://arxiv.org/abs/1511.09433.

Scardapane, Simone, and Dianhui Wang. 2017. “Randomness in Neural Networks: An Overview.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7 (2). https://doi.org/10.1002/widm.1200.

Tang, Minh, Avanti Athreya, Daniel L. Sussman, Vince Lyzinski, and Carey E. Priebe. 2014. “A Nonparametric Two-Sample Hypothesis Testing Problem for Random Dot Product Graphs,” September. http://arxiv.org/abs/1409.2344.

Weinberger, Kilian, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. “Feature Hashing for Large Scale Multitask Learning.” In Proceedings of the 26th Annual International Conference on Machine Learning, 1113–20. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553516.

Zhang, Dell, Jun Wang, Deng Cai, and Jinsong Lu. 2010. “Self-Taught Hashing for Fast Similarity Search.” In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 18–25. SIGIR ’10. New York, NY, USA: ACM. https://doi.org/10.1145/1835449.1835455.