Learning Structured Models with Weak Supervision
Degree type
Graduate group
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Over the past decade, the remarkable success of machine learning in various fields has highlighted the critical role of substantial quantities of training data, thereby bringing both the challenges and prospects in refining data utilization. In response to these challenges, our study delves into weakly supervised learning, a scenario characterized by training data that may be incomplete, noisy, or imprecisely labeled. Our approach broadly considers any random variable with a correlation to the target label as a source of weak supervision signal. Examples include but are not limited to, partial labels, label constraints, and downstream labels for a neural-symbolic model. This thesis aims to develop a unified and comprehensive theoretical framework to elucidate how weak supervision impacts model performance and to examine the trade-offs in algorithmic decisions in weakly supervised learning. Our contributions are divided into three interconnected parts. First, we established a unified theory of weak supervision to characterize the learnability and learning difficulty in classification tasks. A key innovation is the introduction of the concept of separation, which is used to quantify both the learner's prior knowledge and the informativeness of a weak label. Furthermore, we demonstrate the application of our framework via concrete novel results in a variety of learning scenarios such as learning with superset annotations and joint supervision signals. Second, we studied an important source of weak supervision, label constraints, which are ubiquitous in structured learning problems. We offer a critical comparison between regularization and inference strategies for enforcing label constraints, illustrating the bias-variance tradeoff behind these methods. Given this result, we further explore the use of two strategies together and propose conditions for the combined approach to improve both the model complexity and optimal risk. Third, we extend our framework to allow interactions among multiple instances, which are essential in the problem of neuro-symbolic learning and latent structural learning. We propose a necessary and sufficient condition for the learnability of the problem and derive error bounds based on a top-k surrogate loss that is widely used in the neuro-symbolic literature. We also conduct empirical studies to verify our theoretical findings, which also expose the issue of scalability in the weak supervision literature. Our results form a consistent approach to the theoretical understanding of weakly supervised learning, offering insights that bridge the gap between theoretical paradigms and practical applications. We conclude this thesis with two future research directions that are motivated by our work: to understand the implicit bias of approximate inference, and to explore the theory of weakly supervised learning with dependent training samples.