An Empirical Comparison of Probability Models for Dependency Grammar

Loading...
Thumbnail Image

Degree type

Discipline

Subject

Funder

Grant number

License

Copyright date

Distributor

Related resources

Contributor

Abstract

This technical report is an appendix to Eisner (1996): it gives superior experimental results that were reported only in the talk version of that paper, with details of how the results were obtained. Eisner (1996) trained three probability models on a small set of about 4,000 conjunction-free, dependency grammar parses derived from the Wall Street Journal section of the Penn Treebank, and then evaluated the models on a held-out test set, using a novel O(n3) parsing algorithm. The present paper describes some details of the experiments and repeats them with a larger training set of 25,000 sentences. As reported at the talk, the more extensive training yields greatly improved performance, cutting in half the error rate of Eisner (1996). Nearly half the sentences are parsed with no misattachments; two-thirds of sentences are parsed with at most one misattachment. Of the models described in the original paper, the best score is obtained with the generative "model C", which attaches 87-88% of all words to the correct parent. However, better models are also explored, in particular, two simple variants on the comprehension "model B." The better of these has an attachment accuracy of 93% and (unlike model C) tags words more accurately than the comparable trigram tagger. If tags are roughly known in advance, search error is all but eliminated and the new model attains an attachment accuracy of 93%. We find that the parser of Collins (1996) when combined with a highly, trained tagger, also achieves 93% when trained and tested on the same sentences. We briefly discuss the similarities and differences between Collins's model and ours, pointing out the strengths of each and noting that these strengths could be combined for either dependency parsing or phrase-structure parsing.

Advisor

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Publication date

1997-05-01

Volume number

Issue number

Publisher

Publisher DOI

Journal Issues

Comments

University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-96-11.

Recommended citation

Collection