Algorithms for linear and nonlinear approximation of large data

Boulos Harb, University of Pennsylvania

Abstract

A central problem in approximation theory is the concise representation of functions. Given a function or signal described as a vector in high-dimensional space, the goal is to represent it as closely as possible using a linear combination of a small number of (simpler) vectors belonging to a pre-defined dictionary. We develop approximation algorithms for this sparse representation problem under two principal approaches known as linear and nonlinear approximation. The linear approach is equivalent to over-constrained regression. Given f ∈ [Special characters omitted.] , an n × B matrix A , and a p -norm, the objective is to find x ∈ [Special characters omitted.] minimizing ||Ax - f || p . We assume that B is much smaller than n ; hence, the resulting problem is over-constrained. The nonlinear approach offers an extra degree of freedom; it allows us to choose the B representation vectors from a larger set. Assuming A ∈ [Special characters omitted.] describes the dictionary, here we seek x ∈ [Special characters omitted.] with B non-zero components that minimizes || Ax - f ||p . By providing a fast, greedy one-pass streaming algorithm, we show that the solution to a prevalent restricted version of the problem of nonlinear approximation using a compactly-supported wavelet basis is a O (log n )-approximation to the optimal (unrestricted) solution for all p -norms, p ∈ [1, ∞]. For the important case of the Haar wavelet basis, we detail a fully polynomial-time approximation scheme for all p ∈ [1, ∞] based on a one-pass dynamic programming algorithm that, for p > 1, is also streaming. Under other compactly-supported wavelets, a similar algorithm modified for the given wavelet basis yields a QPTAS. Our algorithms extend to variants of the problem such as adaptive quantization and best-basis selection. For linear over-constrained [cursive l]p regression, we demonstrate the existence of core-sets and present an efficient sampling-based approximation algorithm that computes them for all p ∈ [1, ∞). That is, our algorithm samples a small (independent of n ) number of constraints (rows of A and the corresponding elements of f ), then solves an [cursive l]p regression problem on only these constraints producing a solution that yields a (1 + ε)-approximation to the original problem. Our algorithm extends to more general and commonly encountered settings such as weighted p -norms, generalized p -norms, and solutions restricted to a convex space.

Recommended Citation

Boulos Harb, "Algorithms for linear and nonlinear approximation of large data" (January 1, 2007). Dissertations available from ProQuest. Paper AAI3260913.
http://repository.upenn.edu/dissertations/AAI3260913