Random embeddings and hashing



Separation of inputs by random projection

See also matrix factorisations, for some extra ideas on why random projections have a role in motivating compressed sensing, arndomised regressions etc.

Occasionally we might use non-linear projections to increase the dimensionality of our data in the hope of making a non-linear regression approximately linear, which dates back to (Cover 1965).

Cover’s Theorem (Cover 1965):

It was shown that, for a random set of linear inequalities in \(d\) unknowns, the expected number of extreme inequalities, which are necessary and sufficient to imply the entire set, tends to \(2d\) as the number of consistent inequalities tends to infinity, thus bounding the expected necessary storage capacity for linear decision algorithms in separable problems. The results, even those dealing with randomly positioned points, have been combinatorial in nature, and have been essentially independent of the configuration of the set of points in the space.

I am especially interested in random embeddings for kernel approximation.

Over at compressed sensing we mention some other useful such as the Johnson-Lindenstrauss lemma, and these ideas are closely related, in the probabilistic setting, to concentration inequalities.

References

Achlioptas, Dimitris. 2003. β€œDatabase-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins.” Journal of Computer and System Sciences, Special Issue on PODS 2001, 66 (4): 671–87.
Ailon, Nir, and Bernard Chazelle. 2009. β€œThe Fast Johnson–Lindenstrauss Transform and Approximate Nearest Neighbors.” SIAM Journal on Computing 39 (1): 302–22.
Alaoui, Ahmed El, and Michael W. Mahoney. 2014. β€œFast Randomized Kernel Methods With Statistical Guarantees.” arXiv:1411.0306 [Cs, Stat], November.
Andoni, A., and P. Indyk. 2006. β€œNear-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions.” In 47th Annual IEEE Symposium on Foundations of Computer Science, 2006. FOCS ’06, 51:459–68.
Andoni, Alexandr, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. 2013. β€œBeyond Locality-Sensitive Hashing.” arXiv:1306.1547 [Cs], June.
Andoni, Alexandr, and Ilya Razenshteyn. 2015. β€œOptimal Data-Dependent Hashing for Approximate Near Neighbors.” arXiv:1501.01062 [Cs], January.
Auvolat, Alex, and Pascal Vincent. 2015. β€œClustering Is Efficient for Approximate Maximum Inner Product Search.” arXiv:1507.05910 [Cs, Stat], July.
Bach, Francis. 2015. β€œOn the Equivalence Between Kernel Quadrature Rules and Random Feature Expansions.” arXiv Preprint arXiv:1502.06800.
Baraniuk, Richard, Mark Davenport, Ronald DeVore, and Michael Wakin. 2008. β€œA Simple Proof of the Restricted Isometry Property for Random Matrices.” Constructive Approximation 28 (3): 253–63.
Bingham, Ella, and Heikki Mannila. 2001. β€œRandom Projection in Dimensionality Reduction: Applications to Image and Text Data.” In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 245–50. KDD ’01. New York, NY, USA: ACM.
Brault, Romain, Florence d’AlchΓ©-Buc, and Markus Heinonen. 2016. β€œRandom Fourier Features for Operator-Valued Kernels.” In Proceedings of The 8th Asian Conference on Machine Learning, 110–25.
CandΓ¨s, Emmanuel J., and Terence Tao. 2006. β€œNear-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?” IEEE Transactions on Information Theory 52 (12): 5406–25.
Casey, M., C. Rhodes, and M. Slaney. 2008. β€œAnalysis of Minimum Distances in High-Dimensional Musical Spaces.” IEEE Transactions on Audio, Speech, and Language Processing 16 (5): 1015–28.
Choromanski, Krzysztof, Mark Rowland, and Adrian Weller. 2017. β€œThe Unreasonable Effectiveness of Random Orthogonal Embeddings.” arXiv:1703.00864 [Stat], March.
Coleman, Benjamin, Richard G. Baraniuk, and Anshumali Shrivastava. 2020. β€œSub-Linear Memory Sketches for Near Neighbor Search on Streaming Data.” arXiv:1902.06687 [Cs, Eess, Stat], September.
Cover, T. M. 1965. β€œGeometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition.” IEEE Transactions on Electronic Computers EC-14 (3): 326–34.
Dasgupta, Sanjoy. 2000. β€œExperiments with Random Projection.” In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, 143–51. UAI’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Dasgupta, Sanjoy, and Anupam Gupta. 2003. β€œAn Elementary Proof of a Theorem of Johnson and Lindenstrauss.” Random Structures & Algorithms 22 (1): 60–65.
Datar, Mayur, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. β€œLocality-Sensitive Hashing Scheme Based on P-Stable Distributions.” In Proceedings of the Twentieth Annual Symposium on Computational Geometry, 253–62. SCG ’04. New York, NY, USA: ACM.
Dezfouli, Amir, and Edwin V. Bonilla. 2015. β€œScalable Inference for Gaussian Process Models with Black-Box Likelihoods.” In Advances in Neural Information Processing Systems 28, 1414–22. NIPS’15. Cambridge, MA, USA: MIT Press.
Duarte, Marco F., and Richard G. Baraniuk. 2013. β€œSpectral Compressive Sensing.” Applied and Computational Harmonic Analysis 35 (1): 111–29.
Eftekhari, Armin, Han Lun Yap, Michael B. Wakin, and Christopher J. Rozell. 2016. β€œStabilizing Embedology: Geometry-Preserving Delay-Coordinate Maps.” arXiv:1609.06347 [Nlin, Stat], September.
Fodor, Imola. 2002. β€œA Survey of Dimension Reduction Techniques.”
Freund, Yoav, Sanjoy Dasgupta, Mayank Kabra, and Nakul Verma. 2007. β€œLearning the Structure of Manifolds Using Random Projections.” In Advances in Neural Information Processing Systems, 473–80.
Geurts, Pierre, Damien Ernst, and Louis Wehenkel. 2006. β€œExtremely Randomized Trees.” Machine Learning 63 (1): 3–42.
Ghojogh, Benyamin, Ali Ghodsi, Fakhri Karray, and Mark Crowley. 2021. β€œJohnson-Lindenstrauss Lemma, Linear and Nonlinear Random Projections, Random Fourier Features, and Random Kitchen Sinks: Tutorial and Survey.” arXiv:2108.04172 [Cs, Math, Stat], August.
Gionis, Aristides, Piotr Indyky, and Rajeev Motwaniz. 1999. β€œSimilarity Search in High Dimensions via Hashing.” In.
Giryes, R., G. Sapiro, and A. M. Bronstein. 2016. β€œDeep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?” IEEE Transactions on Signal Processing 64 (13): 3444–57.
Gorban, Alexander N., Ivan Yu Tyukin, and Ilya Romanenko. 2016. β€œThe Blessing of Dimensionality: Separation Theorems in the Thermodynamic Limit.” arXiv:1610.00494 [Cs, Stat], October.
Gottwald, Georg A., and Sebastian Reich. 2020. β€œSupervised Learning from Noisy Observations: Combining Machine-Learning Techniques with Data Assimilation.” arXiv:2007.07383 [Physics, Stat], July.
Hall, Peter, and Ker-Chau Li. 1993. β€œOn Almost Linearity of Low Dimensional Projections from High Dimensional Data.” The Annals of Statistics 21 (2): 867–89.
Heusser, Andrew C., Kirsten Ziman, Lucy L. W. Owen, and Jeremy R. Manning. 2017. β€œHyperTools: A Python Toolbox for Visualizing and Manipulating High-Dimensional Data.” arXiv:1701.08290 [Stat], January.
Kammonen, Aku, Jonas Kiessling, Petr PlechÑč, Mattias Sandberg, and Anders Szepessy. 2020. β€œAdaptive Random Fourier Features with Metropolis Sampling.” arXiv:2007.10683 [Cs, Math], November.
Kane, Daniel M., and Jelani Nelson. 2014. β€œSparser Johnson-Lindenstrauss Transforms.” Journal of the ACM 61 (1): 1–23.
Kar, Purushottam, and Harish Karnick. 2012. β€œRandom Feature Maps for Dot Product Kernels.” In Artificial Intelligence and Statistics, 583–91. PMLR.
Koltchinskii, Vladimir, and Evarist GinΓ©. 2000. β€œRandom Matrix Approximation of Spectra of Integral Operators.” Bernoulli 6 (1): 113–67.
Koppel, Alec, Garrett Warnell, Ethan Stump, and Alejandro Ribeiro. 2016. β€œParsimonious Online Learning with Kernels via Sparse Projections in Function Space.” arXiv:1612.04111 [Cs, Stat], December.
Krummenacher, Gabriel, Brian McWilliams, Yannic Kilcher, Joachim M Buhmann, and Nicolai Meinshausen. 2016. β€œScalable Adaptive Stochastic Optimization Using Random Projections.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1750–58. Curran Associates, Inc.
Kulis, Brian, and Kristen Grauman. 2012. β€œKernelized Locality-Sensitive Hashing.” IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (6): 1092–1104.
Landweber, Peter S., Emanuel A. Lazar, and Neel Patel. 2016. β€œOn Fiber Diameters of Continuous Maps.” American Mathematical Monthly 123 (4): 392–97.
Li, Ping, Trevor J. Hastie, and Kenneth W. Church. 2006. β€œVery Sparse Random Projections.” In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 287–96. KDD ’06. New York, NY, USA: ACM.
Li, Zhu, Jean-FranΓ§ois Ton, Dino Oglic, and Dino Sejdinovic. 2019. β€œTowards a Unified Analysis of Random Fourier Features.” In, 10.
McWilliams, Brian, David Balduzzi, and Joachim M Buhmann. 2013. β€œCorrelated Random Features for Fast Semi-Supervised Learning.” In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 1050:440–48. Curran Associates, Inc.
Moosmann, Frank, Bill Triggs, and Frederic Jurie. 2006. β€œFast Discriminative Visual Codebooks Using Randomized Clustering Forests.” In Advances in Neural Information Processing Systems, 985–92.
Oveneke, Meshia CΓ©dric, Mitchel Aliosha-Perez, Yong Zhao, Dongmei Jiang, and Hichem Sahli. 2016. β€œEfficient Convolutional Auto-Encoding via Random Convexification and Frequency-Domain Minimization.” In Advances in Neural Information Processing Systems 29.
Oymak, Samet, and Joel A. Tropp. 2015. β€œUniversality Laws for Randomized Dimension Reduction, with Applications.” arXiv:1511.09433 [Cs, Math, Stat], November.
Rahimi, Ali, and Benjamin Recht. 2007. β€œRandom Features for Large-Scale Kernel Machines.” In Advances in Neural Information Processing Systems, 1177–84. Curran Associates, Inc.
β€”β€”β€”. 2008. β€œUniform Approximation of Functions with Random Bases.” In 2008 46th Annual Allerton Conference on Communication, Control, and Computing, 555–61.
Scardapane, Simone, and Dianhui Wang. 2017. β€œRandomness in Neural Networks: An Overview.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7 (2).
Shi, Jiaxin, Shengyang Sun, and Jun Zhu. 2018. β€œA Spectral Approach to Gradient Estimation for Implicit Distributions.” In. arXiv.
Sinha, Aman, and John C Duchi. 2016. β€œLearning Kernels with Random Features.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1298–1306. Curran Associates, Inc.
Sterge, Nicholas, and Bharath Sriperumbudur. 2021. β€œStatistical Optimality and Computational Efficiency of NystrΓΆm Kernel PCA.” arXiv:2105.08875 [Cs, Math, Stat], May.
Tang, Minh, Avanti Athreya, Daniel L. Sussman, Vince Lyzinski, and Carey E. Priebe. 2014. β€œA Nonparametric Two-Sample Hypothesis Testing Problem for Random Dot Product Graphs.” arXiv:1409.2344 [Math, Stat], September.
Weinberger, Kilian, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. β€œFeature Hashing for Large Scale Multitask Learning.” In Proceedings of the 26th Annual International Conference on Machine Learning, 1113–20. ICML ’09. New York, NY, USA: ACM.
Zhang, Dell, Jun Wang, Deng Cai, and Jinsong Lu. 2010. β€œSelf-Taught Hashing for Fast Similarity Search.” In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 18–25. SIGIR ’10. New York, NY, USA: ACM.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.