The Bracketing Guidelines for the Penn Chinese Treebank (3.0)

Loading...
Thumbnail Image

Degree type

Discipline

Subject

Funder

Grant number

License

Copyright date

Distributor

Related resources

Contributor

Abstract

This document describes the bracketing guidelines for the Penn Chinese Treebank Project. The goal of the project is the creation of a 100-thousand-word corpus of Mandarin Chinese text with syntactic bracketing. The Chinese Treebank has been released via the Linguistic Data Consortium (LDC) and is available to the public. This document can be divided into six parts. Section I discusses six fundamental grammatical relations that are represented in the Treebank. Section II introduces the bracketing tagset, which includes 23 syntactic labels, 26 functional tags, and 7 tags for null elements. Section III, IV and V specify our annotation schemata for noun phrases, verbs phrases, and other minor categories, respectively. Section VI describes our treatment for empty categories, such as trace for syntactic movement, PRO for control, and pro for argument drop. Section VII and VIII cover the coordinated clauses and subordinating clauses. Section IX, X and XI specify the way we handle punctuation, ambiguity, and some problematic cases.

Advisor

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Publication date

2000-10-01

Volume number

Issue number

Publisher

Publisher DOI

Journal Issues

Comments

University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-00-08.

Recommended citation

Collection