Matrix Factorization Under Contamination

Peter Ballen, University of Pennsylvania


In the nonnegative matrix factorization problem, the user inputs a nonnegative matrix V and wants to factor V ≈ WH, with both W and H nonnegative. Standard factorization techniques make unrealistic assumptions about the noise present in the data: that the noise is generated from independent and identically distributed Gaussian process. However, real world datasets are unlikely to satisfy this simplistic assumption. In particular, real world datasets suffer from contamination, anomalies, and outliers that cannot be modeled by simple Gaussian distributions. In this dissertation, we discuss novel techniques for matrix factorization under contamination and non-standard noise models. These techniques can be used both as a replacement for a standard factorization algorithm, or as an independent contamination detection procedure. We also prove a number of complexity bounds on the hardness of the problem.

Subject Area

Computer science|Statistics

Recommended Citation

Ballen, Peter, "Matrix Factorization Under Contamination" (2020). Dissertations available from ProQuest. AAI28022665.