IRCS Technical Reports Series

Document Type

Technical Report

Date of this Version

May 2001


University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-01-09.


This document describes the Part-of-Speech (POS) tagging guidelines for the Penn Korean Treebank Project. The corpus used for this project consists of around 54,000 words and 5,000 sentences. This document starts with a summary of the tagset used in the Penn Korean Treebank, followed by a more detailed discussion of each tag with examples. Then pairs of tags that are easily confused with each other are discussed and guidelines on how to distinguish one from the other for a given base forms and inflections are presented. The document concludes with a list of specific problematic examples with guidelines on how to handle such cases.



Date Posted: 08 August 2006