The Bracketing Guidelines for the Penn Chinese Treebank (3.0)

Loading...
Thumbnail Image
Penn collection
IRCS Technical Reports Series
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Xue, Nianwen
Xia, Fei
Huang, Shizhe
Kroch, Anthony
Contributor
Abstract

This document describes the bracketing guidelines for the Penn Chinese Treebank Project. The goal of the project is the creation of a 100-thousand-word corpus of Mandarin Chinese text with syntactic bracketing. The Chinese Treebank has been released via the Linguistic Data Consortium (LDC) and is available to the public. This document can be divided into six parts. Section I discusses six fundamental grammatical relations that are represented in the Treebank. Section II introduces the bracketing tagset, which includes 23 syntactic labels, 26 functional tags, and 7 tags for null elements. Section III, IV and V specify our annotation schemata for noun phrases, verbs phrases, and other minor categories, respectively. Section VI describes our treatment for empty categories, such as trace for syntactic movement, PRO for control, and pro for argument drop. Section VII and VIII cover the coordinated clauses and subordinating clauses. Section IX, X and XI specify the way we handle punctuation, ambiguity, and some problematic cases.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2000-10-01
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-00-08.
Recommended citation
Collection