Technical Reports (CIS)

Document Type

Technical Report

Date of this Version

October 1993

Comments

University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-93-87.

Abstract

In this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been annotated for skeletal syntactic structure.

Share

COinS
 

Date Posted: 11 July 2007