Date of Award
Doctor of Philosophy (PhD)
Computer and Information Science
The focus of this thesis is to incorporate linguistic theories of semantics into data-driven models for automatic natural language understanding. Most current models rely on an impoverished version of semantics which can be learned automatically from large volumes of unannotated text. However, many aspects of language understanding require deeper models of semantic meaning than those which can be easily derived from word co-occurrence alone. In this thesis, we inform our models using insights from linguistics, so that we can continue to take advantage of large-scale statistical models of language without compromising on depth and interpretability. We begin with a discussion of lexical entailment. We classify pairs of words according a small set of distinct entailment relations: e.g. equivalence, entailment, exclusion, and independence. We show that imposing these relations onto a large, automatically constructed lexical entailment resource leads to measurable improvements in an end-to-end inference task. We then turn our attention to compositional entailment, in particular, to modifier-noun composition. We show that inferences involving modifier-noun phrases (e.g. “red dress”, “imaginary friend”) are much more complex than the conventional wisdom states. In a systematic evaluation of a range of existing state-of-the-art natural language inference systems, we illustrate the inability of current technology to handle the types of common sense inferences necessary for human-like processing of modifier-noun phrases. We propose a data-driven method for operationalizing a formal semantics framework which assigns interpretable semantic representations to individual modifiers. We use our method in order to find instances of fine-grained classes involving multiple modifiers (e.g. “1950s American jazz composers”). We demonstrate that our proposed compositional model outperforms existing non-compositional approaches.
Pavlick, Ellie, "Compositional Lexical Semantics In Natural Language Inference" (2017). Publicly Accessible Penn Dissertations. 2519.