Posterior Sparsity in Unsupervised Dependency Parsing

Loading...
Thumbnail Image
Penn collection
Technical Reports (CIS)
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Gillenwater, Jennifer
Ganchev, Kuzman
Graça, João
Pereira, Fernando
Contributor
Abstract

A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed accuracy over the standard expectation maximization (EM) baseline for 9 of the languages, with an average accuracy improvement of 6%. Further, we show that for 8 out of 12 languages, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors, with an average improvement of 4%. On English text in particular, we show that our approach improves performance over other state of the art techniques.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2010-01-01
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-10-19.
Recommended citation
Collection