Marcus, Mitchell

Search Results

Now showing 1 - 2 of 2

Automatic Construction of Chinese-English Translation Lexicons
(1998-12-01) Melamed, I. Dan; Marcus, Mitchell
The process of constructing translation lexicons from parallel texts (bitexts) can be broken down into three stages: mapping bitext correspondence, counting co-occurrences, and estimating a translation model. State-of-the-art techniques for accomplishing each stage of the process had already been developed, but only for bitexts involving fairly similar languages. Correct and efficient implementation of each stage poses special challenges when the parallel texts involve two very different languages. This report describes our theoretical and empirical investigations into how existing techniques might be extended and applied to Chinese/English bitexts.
Building a Large Annotated Corpus of English: The Penn Treebank
(1993-10-01) Marcus, Mitchell; Santorini, Beatrice; Marcinkiewicz, Mary Ann
In this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been annotated for skeletal syntactic structure.