Methods and Challenges In Inference Across Textual Sources

Chen, Sihao

Methods and Challenges In Inference Across Textual Sources

Files

Chen_upenngdas_0175C_16732.pdf (5.3 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Computer Sciences
Data Science

Subject

Artificial Intelligence
Deep Learning
Information Retrieval
Natural Language Processing
Text Semantics

Copyright date

2024

Permalink

https://repository.upenn.edu/handle/20.500.14332/60863

View all metadata

Author

Chen, Sihao

Abstract

Information technologies, such as search engines, or the more recent generative AI models, have enabled and democratized access to a vast array of content from diverse sources for internet users. However, discerning what is trustworthy across the sheer volume of information has become a challenge for us. The goal of this dissertation is to develop natural language processing (NLP) techniques to facilitate effective, efficient comparison and validation of information across multiple sources. In NLP, The problem of comparing information across sources closely resembles the task of natural language inference (NLI), where a system is expected to classify whether a hypothesis can be inferred from a premise. In this thesis, I focus on addressing three key problems in the standard practices in NLI. First, for open-ended questions or claims with many possible answers, I argue that a system should discover supporting and opposing evidence from a diverse set of perspectives. Next, to facilitate a more fine-grained level comparison of information across sources, I propose a representation learning framework where text semantics are represented by propositions. Lastly, I show that current NLI datasets and models suffer from the assumption that the claim and evidence can always be interpreted in the same context, which can negatively impact NLI models' applicability as a fact verification model in real-world settings.

Advisor

Roth, Dan

Date of degree

2024

Collection

Dissertations and Theses