Marcus, Mitchell
Email Address
ORCID
Disciplines
2 results
Search Results
Now showing 1 - 2 of 2
Publication Building a Large Annotated Corpus of English: The Penn Treebank(1993-10-01) Marcus, Mitchell; Santorini, Beatrice; Marcinkiewicz, Mary AnnIn this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been annotated for skeletal syntactic structure.Publication Automatic Construction of Chinese-English Translation Lexicons(1998-12-01) Melamed, I. Dan; Marcus, MitchellThe process of constructing translation lexicons from parallel texts (bitexts) can be broken down into three stages: mapping bitext correspondence, counting co-occurrences, and estimating a translation model. State-of-the-art techniques for accomplishing each stage of the process had already been developed, but only for bitexts involving fairly similar languages. Correct and efficient implementation of each stage poses special challenges when the parallel texts involve two very different languages. This report describes our theoretical and empirical investigations into how existing techniques might be extended and applied to Chinese/English bitexts.