Marcus, Mitchell

Email Address
ORCID
Disciplines
Research Projects
Organizational Units
Position
Introduction
Research Interests

Search Results

Now showing 1 - 2 of 2
  • Publication
    Building a Large Annotated Corpus of English: The Penn Treebank
    (1993-10-01) Marcus, Mitchell; Santorini, Beatrice; Marcinkiewicz, Mary Ann
    In this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been annotated for skeletal syntactic structure.
  • Publication
    Automatic Construction of Chinese-English Translation Lexicons
    (1998-12-01) Melamed, I. Dan; Marcus, Mitchell
    The process of constructing translation lexicons from parallel texts (bitexts) can be broken down into three stages: mapping bitext correspondence, counting co-occurrences, and estimating a translation model. State-of-the-art techniques for accomplishing each stage of the process had already been developed, but only for bitexts involving fairly similar languages. Correct and efficient implementation of each stage poses special challenges when the parallel texts involve two very different languages. This report describes our theoretical and empirical investigations into how existing techniques might be extended and applied to Chinese/English bitexts.