Incidental Supervision for Natural Language Understanding

He, Hangfeng

Incidental Supervision for Natural Language Understanding

Files

He_upenngdas_0175C_15765.pdf (2.46 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Computer Sciences

Copyright date

2023

Permalink

https://repository.upenn.edu/handle/20.500.14332/59082

View all metadata

Author

He, Hangfeng

Abstract

Acquiring human annotations for natural language understanding (NLU) tasks is often labor-intensive and demands significant domain-specific expertise, making it crucial to obtain supervision from indirect signals to improve target task performance. This dissertation presents a novel approach for enhancing NLU by harnessing on the power of incidental supervision signals, which are present in the data and the environment, regardless of the specific tasks being considered. The primary aim of this research is to deepen our understanding of incidental supervision signals and to develop efficient algorithms for their acquisition, selection, and usage in NLU tasks. This problem presents numerous challenges, including the intricate nature of natural language and the inherent disparities between incidental supervision signals and target tasks. To tackle these challenges, this dissertation employs a multifaceted approach. First, we demonstrate the feasibility of utilizing cost-effective signals to enhance various target tasks. Specifically, we retrieve signals from sentence-level question-answer pairs to help NLU tasks via two types of sentence encoding approaches, depending on whether the target task involves single- or multi-sentence input. Second, we introduce a unified informativeness measure to quantitatively assess the effectiveness of diverse incidental supervision signals for a given target task. This approach offers a promising way to determine, ahead of learning, which supervision signals would be beneficial. Finally, we present a suite of efficient algorithms for exploiting distinct types of incidental supervision signals, including a weighting strategy to enhance sample efficiency in cross-task learning and a post-processing technique for faithful large language model inference with external knowledge.

Advisor

Roth, Dan

Date of degree

2023

Collection

Dissertations and Theses