Manual Annotation of Translational Equivalence: The Blinker Project

Loading...
Thumbnail Image
Penn collection
IRCS Technical Reports Series
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Contributor
Abstract

Bilingual annotators were paid to link roughly sixteen thousand corresponding words between on-line versions of the Bible in modern French and modern English. These annotations are freely available to the research community from http://www.cis.upenn.edu/~melamed. The annotations can be used for several purposes. First, they can be used as a standard data set for developing and testing translation lexicons and statistical translation models. Second, researchers in lexical semantics will be able to mine the annotations for insights about cross-linguistic lexicalization patterns. Third, the annotations can be used in research into certain recently proposed methods for monolingual word-sense disambiguation. This paper describes the annotated texts, the specially designed annotation tool, and the strategies employed to increase the consistency of the annotations. The annotation process was repeated five times by different annotators. Inter-annotator agreement rates indicate that the annotations are reasonably reliable and that the method is easy to replicate.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
1998-02-01
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-98-07.
Recommended citation
Collection