New Subsampling Algorithms for Fast Least Squares Regression
Date of this Version
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data (n>>p). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O(np) and our best method, Uluru, gives an error bound of O(√p/n) which is independent of the amount of subsampling as long as it is above a threshold. We provide theoretical bounds for our algorithms in the fixed design (with Randomized Hadamard preconditioning) as well as sub-Gaussian random design setting. We also compare the performance of our methods on synthetic and real-world datasets and show that if observations are i.i.d., sub-Gaussian then one can directly subsample without the expensive Randomized Hadamard preconditioning without loss of accuracy.
Dhillon, P., Lu, Y., Foster, D. P., & Ungar, L. H. (2013). New Subsampling Algorithms for Fast Least Squares Regression. NIPS, Retrieved from https://repository.upenn.edu/statistics_papers/107
Date Posted: 27 November 2017