Methods and Challenges In Inference Across Textual Sources
Degree type
Graduate group
Discipline
Data Science
Subject
Deep Learning
Information Retrieval
Natural Language Processing
Text Semantics
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Information technologies, such as search engines, or the more recent generative AI models, have enabled and democratized access to a vast array of content from diverse sources for internet users. However, discerning what is trustworthy across the sheer volume of information has become a challenge for us. The goal of this dissertation is to develop natural language processing (NLP) techniques to facilitate effective, efficient comparison and validation of information across multiple sources. In NLP, The problem of comparing information across sources closely resembles the task of natural language inference (NLI), where a system is expected to classify whether a hypothesis can be inferred from a premise. In this thesis, I focus on addressing three key problems in the standard practices in NLI. First, for open-ended questions or claims with many possible answers, I argue that a system should discover supporting and opposing evidence from a diverse set of perspectives. Next, to facilitate a more fine-grained level comparison of information across sources, I propose a representation learning framework where text semantics are represented by propositions. Lastly, I show that current NLI datasets and models suffer from the assumption that the claim and evidence can always be interpreted in the same context, which can negatively impact NLI models' applicability as a fact verification model in real-world settings.