Knowledge Acquisition and Assimilation for Natural Language Understanding

Yu, Xiaodong

Knowledge Acquisition and Assimilation for Natural Language Understanding

Files

Yu_upenngdas_0175C_16781.pdf (5.89 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Data Science

Copyright date

2024

Permalink

https://repository.upenn.edu/handle/20.500.14332/60906

View all metadata

Author

Yu, Xiaodong

Abstract

Understanding natural language often requires relevant background knowledge to disambiguate local context. In the last few years, the research community has focused on using large language models (LLMs) as a key tool for studying natural language understanding (NLU) tasks. While LLMs have demonstrated impressive memorization capabilities and achieved strong performance in multiple tasks, relying solely on parametrized knowledge acquired by the model during its training has limitations for two main reasons: 1) the input query may involve propriety information that is not made available to model developers (e.g. “Please collect all the conversations about pretraining LLM from the provided meeting transcripts.”) 2) the language model lacks awareness of new knowledge (e.g. “Did Manchester United win the game yesterday? ”). Overcoming these challenges requires the extraction and integration of pertinent knowledge from various parts of local documents and the inclusion of external documents during the models’ inference phase. In this thesis, we first focus on improving the acquisition of relevant knowledge to enhance models’ understanding of local context. Then, we investigate and improve the behavior of models when utilizing the given context after acquiring the relevant knowledge. To acquire the necessary knowledge, earlier studies utilized entities within the given context to represent the context information. These studies employed entity coreference models to cluster different contexts of the same entity and utilized an entity-linking model to access external knowledge. More recent studies, on the other hand, utilized dense vectors to represent query information and employed dense retrievers to retrieve relevant passages. In the first half of the thesis, we focus on utilizing events in sentences as a means of representing sentence information and using event coreference and event linking to gather knowledge that is needed to understand the given sentence. In the latter part of this thesis, we aim to enhance the models’ assimilation of new knowledge and faithfulness of the given context to reduce the occurrence of incorrect factual outputs, commonly referred to as hallucinations. Despite remarkable advancements in mitigating hallucinations in large language models (LLMs) by retrieval augmentation, it remains challenging to measure thereliability of LLMs using static question-answering (QA) data. Specifically, given the potential of data contamination (e.g., leading to memorization), good static benchmark performance does not ensure that model can reliably use the provided evidence for responding, which is essential to avoid hallucination when the required knowledge is new or private. How to fairly and efficiently assess and improve the model faithfulness to the provided factual knowledge remains an open and important problem. Besides being faithful to the given context, LLMs usually need to do proper reasoning across multiple pieces of knowledge to answer complicated questions, and hallucinations in the reasoning steps may also lead to the incorrect output. In the last section of this thesis, we study the behavior of LLMs on reasoning problems, and propose methods to evaluate the robustness of the reasoning capability.

Advisor

Roth, Dan

Date of degree

2024

Collection

Dissertations and Theses