Investigations into the role of lexical semantics in word sense disambiguation

Hoa Trang Dang, University of Pennsylvania

Abstract

Verbs that can have more than one meaning pose problems for Natural Language Processing (NLP) applications. While homonyms (words with unrelated meanings) are fairly tractable, polysemous verbs with similar related meanings pose the greatest hurdle for automatic Word Sense Disambiguation (WSD). A major problem with WSD for verbs is that even humans disagree about what constitutes a different sense for a polysemous word. This thesis investigates verb lexical semantics and their computational representations, and how these can be used for automatic WSD. Our main contribution is in defining criteria by which humans make sense distinctions for verbs, and in translating these criteria into linguistically-motivated features that we use to build a state-of-the-art automatic WSD system. Our explicit criteria for sense distinctions allow humans to sense-tag data more consistently. Improved human performance on the WSD task enables improved system performance. We begin by examining the definition of verb polysemy implicit in Levin verb classes. We describe our work on VerbNet, a lexical resource in which different senses of a verb are defined by membership in different verb classes; the classes have distinctive syntactic frames and explicit semantic predicates that characterize the verb senses in that class. We then translate some of these lexical semantic characteristics into richer linguistic features used to build our automatic WSD system. The system performs competitively on the English verbs of Senseval-1 and Senseval-2 by combining information from syntax, lexical collocations, and semantic class constraints on verb arguments. Adding gold-standard predicate-argument information from PropBank further improves system performance. Because humans have difficulty making fine-grained sense distinctions, creation of manually sense-tagged corpora is time-consuming and expensive. We experiment with active learning to get additional training data for our system, but find that the quality of manually sense-tagged data is limited by an inconsistent or unclear sense inventory. We develop criteria for grouping senses and show that well-defined groupings of WordNet senses can improve both human inter-annotator agreement and system performance. The groupings fit into a hierarchy of WordNet senses that allow different NLP applications to use different granularities of sense distinctions.

Subject Area

Computer science

Recommended Citation

Dang, Hoa Trang, "Investigations into the role of lexical semantics in word sense disambiguation" (2004). Dissertations available from ProQuest. AAI3152025.
https://repository.upenn.edu/dissertations/AAI3152025

Share

COinS