Document Type
Working Paper
Date of this Version
6-26-2012
Abstract
We identified features that drive differential
accuracy in word sense disambiguation
(WSD) by building regression models using
10,000 coarse-grained WSD instances which
were labeled on Mturk. Features predictive of
accuracy include properties of the target word
(word frequency, part of speech, and number
of possible senses), the example context
(length), and the Turker’s engagement with
our task. The resulting model gives insight
into which words are difficult to disambiguate.
We also show that having many Turkers label
the same instance provides at least a partial
substitute for more expensive annotation.
Date Posted: 28 October 2014