|
Department of Computer & Information Science |
Technical Reports (CIS)
TITLE:
Graphical Models for Primarily Unsupervised Sequence Labeling
AUTHOR(S):
Neal Parikh, University of Pennsylvania
Mark Dredze, University of Pennsylvania
DOCUMENT TYPE: Technical Report
» Download the Document (PDF format - 308 K) - 01 January 2007
» Tell a colleague about it.
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-07-18.
ABSTRACT:
Most models used in natural language processing must be trained on large corpora of labeled text. This
tutorial explores a "primarily unsupervised" approach (based on graphical models) that augments a corpus
of unlabeled text with some form of prior domain knowledge, but does not require any fully labeled examples.
We survey probabilistic graphical models for (supervised) classification and sequence labeling and then
present the prototype-driven approach of Haghighi and Klein (2006) to sequence labeling in detail, including
a discussion of the theory and implementation of both conditional random fields and prototype learning.
We show experimental results for English part of speech tagging.
DATE POSTED: 04 October 2007

