Date of Award
2020
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Graduate Group
Computer and Information Science
First Advisor
Aaron Roth
Abstract
In the nonnegative matrix factorization problem, the user inputs a nonnegative matrix V and wants to factor V to WH, with both W and H nonnegative. Standard factorization techniques make unrealistic assumptions about the noise present in the data: that the noise is generated from independent and identically distributed Gaussian process. However, real world datasets are unlikely to satisfy this simplistic assumption. In particular, real world datasets suffer from contamination, anomalies, and outliers that cannot be modeled by simple Gaussian distributions. In this dissertation, we discuss novel techniques for matrix factorization under contamination and non-standard noise models. These techniques can be used both as a replacement for a standard factorization algorithm, or as an independent contamination detection procedure. We also prove a number of complexity bounds on the hardness of the problem.
Recommended Citation
Ballen, Peter Lewis, "Matrix Factorization Under Contamination" (2020). Publicly Accessible Penn Dissertations. 3966.
https://repository.upenn.edu/edissertations/3966