When Is Word Sense Disambiguation Difficult? A Crowdsourcing Approach

Loading...
Thumbnail Image
Penn collection
Wharton Research Scholars
Degree type
Discipline
Subject
Business
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Kaliannan, Krishna N
Contributor
Abstract

We identified features that drive differential accuracy in word sense disambiguation (WSD) by building regression models using 10,000 coarse-grained WSD instances which were labeled on Mturk. Features predictive of accuracy include properties of the target word (word frequency, part of speech, and number of possible senses), the example context (length), and the Turker’s engagement with our task. The resulting model gives insight into which words are difficult to disambiguate. We also show that having many Turkers label the same instance provides at least a partial substitute for more expensive annotation.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2012-06-26
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection