IRCS Technical Reports Series

Document Type

Technical Report

Date of this Version

October 2000

Comments

University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-00-08.

Abstract

This document describes the bracketing guidelines for the Penn Chinese Treebank Project. The goal of the project is the creation of a 100-thousand-word corpus of Mandarin Chinese text with syntactic bracketing. The Chinese Treebank has been released via the Linguistic Data Consortium (LDC) and is available to the public.

This document can be divided into six parts. Section I discusses six fundamental grammatical relations that are represented in the Treebank. Section II introduces the bracketing tagset, which includes 23 syntactic labels, 26 functional tags, and 7 tags for null elements. Section III, IV and V specify our annotation schemata for noun phrases, verbs phrases, and other minor categories, respectively. Section VI describes our treatment for empty categories, such as trace for syntactic movement, PRO for control, and pro for argument drop. Section VII and VIII cover the coordinated clauses and subordinating clauses. Section IX, X and XI specify the way we handle punctuation, ambiguity, and some problematic cases.

Share

COinS

Date Posted: 11 August 2006