Bracketing Guidelines for Penn Korean TreeBank

Loading...
Thumbnail Image
Penn collection
IRCS Technical Reports Series
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract

This document describes the syntactic bracketing guidelines for the Penn Korean Treebank, which is an online corpus of Korean texts annotated with morphological and syntactic information. The corpus consists of around 54,000 words and 5,000 sentences. The Treebank uses a phrase structure style of annotation, making head/phrasal node distinctions, argument/adjunct distinctions, and identifying empty arguments and traces for moved constituents. This document is organized as follows. In section 2, the basic syntactic ingredients of a clause structure are presented. Some notational conventions are introduced in section 3, including different types of syntactic tags, such as head level tags, phrase level tags and function tags used in the Treebank. In section 4, the bracketing guidelines for various types of clauses are discussed, including simple clauses, subordinate clauses, and clauses with coordination. Several types of subcategorizaion frames found in the Treebank are then presented in section 5, followed by bracketing guidelines for various linguistic phenomena in sections 6 to 21, including guidelines for annotating punctuation. The document ends with guidelines for handling some bracketing ambiguities and for handling some confusing examples.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2001-05-01
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-01-10.
Recommended citation
Collection