Wharton Research Scholars

Document Type

Working Paper

Date of this Version



We identified features that drive differential

accuracy in word sense disambiguation

(WSD) by building regression models using

10,000 coarse-grained WSD instances which

were labeled on Mturk. Features predictive of

accuracy include properties of the target word

(word frequency, part of speech, and number

of possible senses), the example context

(length), and the Turker’s engagement with

our task. The resulting model gives insight

into which words are difficult to disambiguate.

We also show that having many Turkers label

the same instance provides at least a partial

substitute for more expensive annotation.

Included in

Business Commons



Date Posted: 28 October 2014