IRCS Technical Reports Series
Document Type
Technical Report
Date of this Version
May 2001
Abstract
This document describes the Part-of-Speech (POS) tagging guidelines for the Penn Korean Treebank Project. The corpus used for this project consists of around 54,000 words and 5,000 sentences. This document starts with a summary of the tagset used in the Penn Korean Treebank, followed by a more detailed discussion of each tag with examples. Then pairs of tags that are easily confused with each other are discussed and guidelines on how to distinguish one from the other for a given base forms and inflections are presented. The document concludes with a list of specific problematic examples with guidelines on how to handle such cases.
Date Posted: 08 August 2006
Comments
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-01-09.