Wharton Research Scholars

Document Type

Working Paper

Date of this Version

6-26-2012

Abstract

We identified features that drive differential

accuracy in word sense disambiguation

(WSD) by building regression models using

10,000 coarse-grained WSD instances which

were labeled on Mturk. Features predictive of

accuracy include properties of the target word

(word frequency, part of speech, and number

of possible senses), the example context

(length), and the Turker’s engagement with

our task. The resulting model gives insight

into which words are difficult to disambiguate.

We also show that having many Turkers label

the same instance provides at least a partial

substitute for more expensive annotation.

Included in

Business Commons

Share

COinS
 

Date Posted: 28 October 2014