# Algorithms for linear and nonlinear approximation of large data

#### Abstract

A central problem in approximation theory is the concise representation of functions. Given a function or signal described as a vector in high-dimensional space, the goal is to represent it as closely as possible using a linear combination of a small number of (simpler) vectors belonging to a pre-defined dictionary. We develop approximation algorithms for this sparse representation problem under two principal approaches known as linear and nonlinear approximation. The linear approach is equivalent to over-constrained regression. Given f ∈ [special characters omitted], an n × B matrix A, and a p-norm, the objective is to find x ∈ [special characters omitted] minimizing ∥Ax - f∥ p. We assume that B is much smaller than n; hence, the resulting problem is over-constrained. The nonlinear approach offers an extra degree of freedom; it allows us to choose the B representation vectors from a larger set. Assuming A ∈ [special characters omitted] describes the dictionary, here we seek x ∈ [special characters omitted] with B non-zero components that minimizes ∥ Ax - f∥p. By providing a fast, greedy one-pass streaming algorithm, we show that the solution to a prevalent restricted version of the problem of nonlinear approximation using a compactly-supported wavelet basis is a O(log n)-approximation to the optimal (unrestricted) solution for all p-norms, p ∈ [1, ∞]. For the important case of the Haar wavelet basis, we detail a fully polynomial-time approximation scheme for all p ∈ [1, ∞] based on a one-pass dynamic programming algorithm that, for p > 1, is also streaming. Under other compactly-supported wavelets, a similar algorithm modified for the given wavelet basis yields a QPTAS. Our algorithms extend to variants of the problem such as adaptive quantization and best-basis selection. For linear over-constrained ℓp regression, we demonstrate the existence of core-sets and present an efficient sampling-based approximation algorithm that computes them for all p ∈ [1, ∞). That is, our algorithm samples a small (independent of n) number of constraints (rows of A and the corresponding elements of f), then solves an ℓp regression problem on only these constraints producing a solution that yields a (1 + ε)-approximation to the original problem. Our algorithm extends to more general and commonly encountered settings such as weighted p-norms, generalized p-norms, and solutions restricted to a convex space.

#### Subject Area

Computer science

#### Recommended Citation

Harb, Boulos, "Algorithms for linear and nonlinear approximation of large data" (2007). *Dissertations available from ProQuest*. AAI3260913.

https://repository.upenn.edu/dissertations/AAI3260913