Center for Human Modeling and Simulation

Sparsity in Dependency Grammar Induction

Ben Taskar, University of Pennsylvania
Fernando CN Pereira, Google, Inc.
Joao V. Graca, L2F INESC-ID
Jennifer Gillenwater, University of Pennsylvania
Kuzman Ganchev, University of Pennsylvania

Document Type Technical Report

Abstract

A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.

 

Date Posted: 11 July 2012