Technical Reports (CIS)
Document Type
Technical Report
Date of this Version
October 1993
Abstract
In this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been annotated for skeletal syntactic structure.
Recommended Citation
Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz, "Building a Large Annotated Corpus of English: The Penn Treebank", . October 1993.
Date Posted: 11 July 2007
Comments
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-93-87.