New Subsampling Algorithms for Fast Least Squares Regression

Penn collection
Statistics Papers
Degree type
Discipline
Subject
Computer Sciences
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Dhillon, Paramveer
Lu, Yichao
Foster, Dean P
Ungar, Lyle H
Contributor
Abstract

We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data (n>>p). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O(np) and our best method, Uluru, gives an error bound of O(√p/n) which is independent of the amount of subsampling as long as it is above a threshold. We provide theoretical bounds for our algorithms in the fixed design (with Randomized Hadamard preconditioning) as well as sub-Gaussian random design setting. We also compare the performance of our methods on synthetic and real-world datasets and show that if observations are i.i.d., sub-Gaussian then one can directly subsample without the expensive Randomized Hadamard preconditioning without loss of accuracy.

Advisor
Date of presentation
2013-01-01
Conference name
Statistics Papers
Conference dates
2023-05-17T15:29:47.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection