Melamed, I. Dan

View all metadata

Search Results

Now showing 1 - 3 of 3

Annotation Style Guide for the Blinker Project
(1998-02-01) Melamed, I. Dan
This annotation style guide was created by and for the Blinker project at the University of Pennsylvania. The Blinker project was so named after the “bilingual linker” GUI, which was created to enable bilingual annotators to “link” word tokens that are mutual translations in parallel texts.
Manual Annotation of Translational Equivalence: The Blinker Project
(1998-02-01) Melamed, I. Dan
Bilingual annotators were paid to link roughly sixteen thousand corresponding words between on-line versions of the Bible in modern French and modern English. These annotations are freely available to the research community from http://www.cis.upenn.edu/~melamed. The annotations can be used for several purposes. First, they can be used as a standard data set for developing and testing translation lexicons and statistical translation models. Second, researchers in lexical semantics will be able to mine the annotations for insights about cross-linguistic lexicalization patterns. Third, the annotations can be used in research into certain recently proposed methods for monolingual word-sense disambiguation. This paper describes the annotated texts, the specially designed annotation tool, and the strategies employed to increase the consistency of the annotations. The annotation process was repeated five times by different annotators. Inter-annotator agreement rates indicate that the annotations are reasonably reliable and that the method is easy to replicate.
Models of Co-occurrence
(1998-02-01) Melamed, I. Dan
A model of co-occurrence in bitext is a boolean predicate that indicates whether a given pair of word tokens co-occur in corresponding regions of the bitext space. Co-occurrence is a precondition for the possibility that two tokens might be mutual translations. Models of co-occurrence are the glue that binds methods for mapping bitext correspondence with methods for estimating translation models into an integrated system for exploiting parallel texts. Different models of co-occurrence are possible, depending on the kind of bitext map that is available, the language-specific information that is available, and the assumptions made about the nature of translational equivalence. Although most statistical translation models are based on models of co-occurrence, modeling co-occurrence correctly is more difficult than it may at first appear.

Melamed, I. Dan

Email Address

ORCID

Disciplines

Research Projects

Organizational Units

Position

Introduction

Research Interests

Filters

Author

Date

Type

Publication Type

Settings

Sort By

Results per page

Search Results

Usage statistics

Penn's Heritage