Matrix Factorization Under Contamination

Ballen, Peter Lewis

Matrix Factorization Under Contamination

Files

Ballen_upenngdas_0175C_14264.pdf (799.86 KB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Subject

nonnegative matrix factorization
robust statistics
Computer Sciences

Copyright date

2021-08-31T20:20:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/31040

View all metadata

Author

Ballen, Peter Lewis

Abstract

In the nonnegative matrix factorization problem, the user inputs a nonnegative matrix V and wants to factor V to WH, with both W and H nonnegative. Standard factorization techniques make unrealistic assumptions about the noise present in the data: that the noise is generated from independent and identically distributed Gaussian process. However, real world datasets are unlikely to satisfy this simplistic assumption. In particular, real world datasets suffer from contamination, anomalies, and outliers that cannot be modeled by simple Gaussian distributions. In this dissertation, we discuss novel techniques for matrix factorization under contamination and non-standard noise models. These techniques can be used both as a replacement for a standard factorization algorithm, or as an independent contamination detection procedure. We also prove a number of complexity bounds on the hardness of the problem.

Advisor

Aaron Roth

Date of degree

2020-01-01

Collection

Dissertations and Theses