IRCS Technical Reports Series
Document Type
Technical Report
Date of this Version
October 2000
Abstract
This document describes the bracketing guidelines for the Penn Chinese Treebank Project. The goal of the project is the creation of a 100-thousand-word corpus of Mandarin Chinese text with syntactic bracketing. The Chinese Treebank has been released via the Linguistic Data Consortium (LDC) and is available to the public.
This document can be divided into six parts. Section I discusses six fundamental grammatical relations that are represented in the Treebank. Section II introduces the bracketing tagset, which includes 23 syntactic labels, 26 functional tags, and 7 tags for null elements. Section III, IV and V specify our annotation schemata for noun phrases, verbs phrases, and other minor categories, respectively. Section VI describes our treatment for empty categories, such as trace for syntactic movement, PRO for control, and pro for argument drop. Section VII and VIII cover the coordinated clauses and subordinating clauses. Section IX, X and XI specify the way we handle punctuation, ambiguity, and some problematic cases.
Date Posted: 11 August 2006
Comments
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-00-08.