Han, Na-Rae

Email Address
ORCID
Disciplines
Research Projects
Organizational Units
Position
Introduction
Research Interests

Search Results

Now showing 1 - 3 of 3
  • Publication
    Guidelines for Penn Korean Treebank Version 2.0
    (2005-10-20) Han, Na-Rae; Ryu, Shijong
    The Korean Treebank Annotations Version 2.0 is a second volume of The Korean Treebank Annotations (Palmer et al., 2002; Han et al., 2002). It contains new texts that are from the news domain: the original corpus for the Korean Treebank 2.0 was extracted from The Korean Newswire corpus published by LDC, catalog number LDC2000T45. The Korean Treebank Annotations Version 2.0 consists of 647 news articles in 112 files which contain 132,040 words and 5,010 sentences. There are 40,252 unique words and 13,844 unique morphemes (12,681 unique morphemes excluding foreign characters and arabic numbers). The annotated text measures about 8.5MB in size. While annotating the new texts, many new linguistic constructions and phenomena were encountered which called for setting additional guidelines. Furthermore, a few guidelines used for the first volume of the Korean Treebank were re-examined and modified in the second volume. This document outlines the guidelines that were newly introduced for the second volume of the Penn Korean Treebank, as well as the ones that have been revised since the publication of volume 1.0. Therefore, this is not a self-contained document, but is rather an addendum to the two previously published guidelines for the Penn Korean Treebank (Han and Han, 2001; Han et al., 2001).
  • Publication
    Bracketing Guidelines for Penn Korean TreeBank
    (2001-05-01) Han, Na-Rae; Han, Chung-hye; Ko, Eon-Suk
    This document describes the syntactic bracketing guidelines for the Penn Korean Treebank, which is an online corpus of Korean texts annotated with morphological and syntactic information. The corpus consists of around 54,000 words and 5,000 sentences. The Treebank uses a phrase structure style of annotation, making head/phrasal node distinctions, argument/adjunct distinctions, and identifying empty arguments and traces for moved constituents. This document is organized as follows. In section 2, the basic syntactic ingredients of a clause structure are presented. Some notational conventions are introduced in section 3, including different types of syntactic tags, such as head level tags, phrase level tags and function tags used in the Treebank. In section 4, the bracketing guidelines for various types of clauses are discussed, including simple clauses, subordinate clauses, and clauses with coordination. Several types of subcategorizaion frames found in the Treebank are then presented in section 5, followed by bracketing guidelines for various linguistic phenomena in sections 6 to 21, including guidelines for annotating punctuation. The document ends with guidelines for handling some bracketing ambiguities and for handling some confusing examples.
  • Publication
    Part of Speech Tagging Guidelines for Penn Korean Treebank
    (2001-05-01) Han, Chung-hye; Han, Na-Rae
    This document describes the Part-of-Speech (POS) tagging guidelines for the Penn Korean Treebank Project. The corpus used for this project consists of around 54,000 words and 5,000 sentences. This document starts with a summary of the tagset used in the Penn Korean Treebank, followed by a more detailed discussion of each tag with examples. Then pairs of tags that are easily confused with each other are discussed and guidelines on how to distinguish one from the other for a given base forms and inflections are presented. The document concludes with a list of specific problematic examples with guidelines on how to handle such cases.