IRCS Technical Reports Series
Document Type
Technical Report
Date of this Version
May 2001
Abstract
This document describes the syntactic bracketing guidelines for the Penn Korean Treebank, which is an online corpus of Korean texts annotated with morphological and syntactic information. The corpus consists of around 54,000 words and 5,000 sentences. The Treebank uses a phrase structure style of annotation, making head/phrasal node distinctions, argument/adjunct distinctions, and identifying empty arguments and traces for moved constituents. This document is organized as follows. In section 2, the basic syntactic ingredients of a clause structure are presented. Some notational conventions are introduced in section 3, including different types of syntactic tags, such as head level tags, phrase level tags and function tags used in the Treebank. In section 4, the bracketing guidelines for various types of clauses are discussed, including simple clauses, subordinate clauses, and clauses with coordination. Several types of subcategorizaion frames found in the Treebank are then presented in section 5, followed by bracketing guidelines for various linguistic phenomena in sections 6 to 21, including guidelines for annotating punctuation. The document ends with guidelines for handling some bracketing ambiguities and for handling some confusing examples.
Date Posted: 09 August 2006
Comments
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-01-10.