Part of Speech Tagging Guidelines for Penn Korean Treebank

Loading...
Thumbnail Image
Penn collection
IRCS Technical Reports Series
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Han, Chung-hye
Contributor
Abstract

This document describes the Part-of-Speech (POS) tagging guidelines for the Penn Korean Treebank Project. The corpus used for this project consists of around 54,000 words and 5,000 sentences. This document starts with a summary of the tagset used in the Penn Korean Treebank, followed by a more detailed discussion of each tag with examples. Then pairs of tags that are easily confused with each other are discussed and guidelines on how to distinguish one from the other for a given base forms and inflections are presented. The document concludes with a list of specific problematic examples with guidelines on how to handle such cases.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2001-05-01
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-01-09.
Recommended citation
Collection