Technical Reports (CIS)

Document Type

Technical Report

Date of this Version

October 1993


University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-93-87.


In this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been annotated for skeletal syntactic structure.



Date Posted: 11 July 2007