Randomised regression



Tackling your regression, by using random embeddings of the predictors (and/or predictions?). Usually this means using low-dimensional projections, to reduce the dimensionality of a high dimensional regression. In this case it is not far from compressed sensing, except in how we handle noise. In this linear model case, this is of course random linear algebra, and may be a randomised matrix factorisation. You can do it the other way and prject something into a higher dimensional space, which is a popular trick for kernel approximation.

I am especially interested in seeing how this might be useful for dependent data, especially time series.

Brian McWilliams, Gabriel Krummenacher and Mario LučiΔ‡, Randomized Linear Regression: A brief overview and recent results. Gabriel implemented some of the algorithms mentioned, e.g.

  • Subsampled Randomized Fourier Transform.
  • SRHT: Uses the Subsampled Randomized Hadamard Transform (SRHT), equivalent to leverage based sampling.
  • aIWS, aRWS: Samples based on approximated statistical influence.
  • Uluru: SRHT with bias correction.

Martin Wainright, Statistics meets Optimization: Randomization and approximation for high-dimensional problems.

In the modern era of high-dimensional data, the interface between mathematical statistics and optimization has become an increasingly vibrant area of research. In this course, we provide some vignettes into this interface, including the following topics:

  • Dimensionality reduction via random projection. The naive idea of projecting high-dimensional data to a randomly chosen low-dimensional space is remarkably effective. We discuss the classical Johnson-Lindenstrauss lemma, as well as various modern variants that provide computationally-efficient embeddings with strong guarantees.

  • When is it possible to quickly obtain approximate solutions of large-scale convex programs? In practice, methods based on randomized projection can work very well, and arguments based on convex analysis and concentration of measure provide a rigorous underpinning to these observations.

  • Optimization problems with some form of nonconvexity arise frequently in statistical settings -- for instance, in problems with latent variables, combinatorial constraints, or rank constraints. Nonconvex programs are known to be intractable in a complexity-theoretic sense, but the random ensembles arising in statistics are not adversarially constructed. Under what conditions is it possible to make rigorous guarantees about the behavior of simple iterative algorithms for such problems? We develop some general theory for addressing these questions, exploiting tools from both optimization theory and empirical process theory.

References

Bahmani, Sohail, and Justin Romberg. 2017. β€œAnchored Regression: Solving Random Convex Equations via Convex Programming.” arXiv:1702.05327 [Cs, Math, Stat], February.
Choromanski, Krzysztof, Mark Rowland, and Adrian Weller. 2017. β€œThe Unreasonable Effectiveness of Random Orthogonal Embeddings.” arXiv:1703.00864 [Stat], March.
Cover, T. M. 1965. β€œGeometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition.” IEEE Transactions on Electronic Computers EC-14 (3): 326–34.
Dezfouli, Amir, and Edwin V. Bonilla. 2015. β€œScalable Inference for Gaussian Process Models with Black-Box Likelihoods.” In Advances in Neural Information Processing Systems 28, 1414–22. NIPS’15. Cambridge, MA, USA: MIT Press.
Dhillon, Paramveer, Yichao Lu, Dean P. Foster, and Lyle Ungar. 2013. β€œNew Subsampling Algorithms for Fast Least Squares Regression.” In Advances in Neural Information Processing Systems, 360–68. Curran Associates, Inc.
Gilbert, Anna C., Yi Zhang, Kibok Lee, Yuting Zhang, and Honglak Lee. 2017. β€œTowards Understanding the Invertibility of Convolutional Neural Networks.” arXiv:1705.08664 [Cs, Stat], May.
Gottwald, Georg A., and Sebastian Reich. 2020. β€œSupervised Learning from Noisy Observations: Combining Machine-Learning Techniques with Data Assimilation.” arXiv:2007.07383 [Physics, Stat], July.
Gribonval, RΓ©mi, Gilles Blanchard, Nicolas Keriven, and Yann Traonmilin. 2017. β€œCompressive Statistical Learning with Random Feature Moments.” arXiv:1706.07180 [Cs, Math, Stat], June.
Gupta, Pawan, and Marianna Pensky. 2016. β€œSolution of Linear Ill-Posed Problems Using Random Dictionaries.” arXiv:1605.07913 [Math, Stat], May.
Heinze, Christina, Brian McWilliams, and Nicolai Meinshausen. 2016. β€œDUAL-LOCO: Distributing Statistical Estimation Using Random Projections.” In, 875–83.
Heinze, Christina, Brian McWilliams, Nicolai Meinshausen, and Gabriel Krummenacher. 2014. β€œLOCO: Distributing Ridge Regression with Random Projections.” arXiv:1406.3469 [Stat], June.
Krummenacher, Gabriel, Brian McWilliams, Yannic Kilcher, Joachim M Buhmann, and Nicolai Meinshausen. 2016. β€œScalable Adaptive Stochastic Optimization Using Random Projections.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1750–58. Curran Associates, Inc.
McWilliams, Brian, David Balduzzi, and Joachim M Buhmann. 2013. β€œCorrelated Random Features for Fast Semi-Supervised Learning.” In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 1050:440–48. Curran Associates, Inc.
McWilliams, Brian, Gabriel Krummenacher, Mario Lucic, and Joachim M. Buhmann. 2014. β€œFast and Robust Least Squares Estimation in Corrupted Linear Models.” In Advances in Neural Information Processing Systems, 415–23.
Rahimi, Ali, and Benjamin Recht. 2007. β€œRandom Features for Large-Scale Kernel Machines.” In Advances in Neural Information Processing Systems, 1177–84. Curran Associates, Inc.
β€”β€”β€”. 2008. β€œUniform Approximation of Functions with Random Bases.” In 2008 46th Annual Allerton Conference on Communication, Control, and Computing, 555–61.
Rosenfeld, Amir, and John K. Tsotsos. 2018. β€œIntriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing.”
Scardapane, Simone, and Dianhui Wang. 2017. β€œRandomness in Neural Networks: An Overview.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7 (2).
Sinha, Aman, and John C Duchi. 2016. β€œLearning Kernels with Random Features.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1298–1306. Curran Associates, Inc.
Soni, Akshay, and Yashar Mehdad. 2017. β€œRIPML: A Restricted Isometry Property Based Approach to Multilabel Learning.” arXiv:1702.05181 [Cs, Stat], February.
Thanei, Gian-Andrea, Christina Heinze, and Nicolai Meinshausen. 2017. β€œRandom Projections For Large-Scale Regression.” arXiv:1701.05325 [Math, Stat], January.
Wang, HaiYing, Rong Zhu, and Ping Ma. 2017. β€œOptimal Subsampling for Large Sample Logistic Regression.” arXiv:1702.01166 [Stat], February.
Zhang, Xiao, Lingxiao Wang, and Quanquan Gu. 2017. β€œStochastic Variance-Reduced Gradient Descent for Low-Rank Matrix Recovery from Linear Measurements.” arXiv:1701.00481 [Stat], January.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.