Statistical Spectral Algorithms for Learning from Discrete Data

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Computer and Information Science
Discipline
Computer Sciences
Subject
discrete data
item response theory
permutation synchronization
ranking
spectral method
statistics
Funder
Grant number
License
Copyright date
01/01/2024
Distributor
Related resources
Author
Nguyen, Manh Duc
Contributor
Abstract

In recent decades, the interaction between computer systems and human users has generated a substantial volume of data. A significant portion of this data takes on a discrete form. Notable examples include choice data where a user selects an item from a list, binary response data where users vote either yes or no on an item, and ranking data where users provide a complete ordering of items based on preference. The development of efficient, robust, and accurate algorithms tailored to handle discrete data has become a focal point of interest across various applications, including recommendation systems, the social sciences, and psychometrics, among others. The present thesis contributes to this dynamic and evolving field. We focus on a class of efficient and powerful statistical algorithms known as spectral algorithms. We introduce novel, efficient and provably accurate spectral algorithms, and analyse the theoretical performance guarantees of classical spectral algorithms when applied to discrete data.
For binary and ordered response data, we design novel spectral algorithms that are not only provably accurate but, under reasonable assumptions, also achieve the optimal sample complexity. This is particularly significant for the Rasch model, a fundamental model in psychometrics. Beyond binary response data, we introduce a generalized spectral algorithm designed to yield precise estimates under the Partial Credits model, which extends the Rasch model to encompass discrete ordered responses and ratings data. Our proposed spectral algorithms outperform other popular algorithms in terms of both accuracy and efficiency. In the domain of ranking data, we present novel spectral algorithms that address two well known problems in computer science. Firstly, we tackle the challenge of inference under a mixture of Plackett-Luce models by introducing a novel two-step algorithm. We propose an initialization algorithm based on spectral clustering, which offers provable guarantees. Furthermore, by recognizing the connection between the M-step of the EM algorithm and Markov chain analysis, we introduce a novel EM algorithm that surpasses the accuracy and time efficiency of previously proposed algorithms in the literature. Secondly, we delve into the permutation synchronization problem, which holds a broad range of applications in computer vision. We address a subtle yet critical limitation of previously proposed spectral algorithms, thereby introducing an innovative spectral algorithm designed specifically for this problem. The resulting algorithm enjoys information-theoretic optimal guarantee and performs significantly better than the previous approaches.

Advisor
Zhang, Anderson
Date of degree
2024
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation