Statistical LTAG parsing

Libin Shen, University of Pennsylvania


In this work, we apply statistical learning algorithms to Lexicalized Tree Adjoining Grammar (LTAG) parsing, as an effort toward statistical analysis over deep structures. LTAG parsing is a well known hard problem. Statistical methods successfully applied to LTAG parsing could also be used in many other structure prediction problems in NLP. For the purpose of achieving accurate and efficient LTAG parsing, we will investigate two aspects of the problem, the data structure and the algorithm. 1. We introduce LTAG-spinal, a variant of LTAG with very desirable linguistic, computational and statistical properties. It can be shown that LTAG-spinal with adjunction constraints is weakly equivalent to the traditional LTAG. For the purpose of statistical processing, we extract an LTAG-spinal treebank from the Penn Treebank with Propbank annotation. 2. We not only explore various parsing strategies, but also investigate the reranking approach. (a) We first propose a left-to-right incremental parser for LTAG-spinal, as an attempt to dynamically incorporate supertagging and dependency analysis. A perceptron like discriminative learning algorithm is used for training. We further investigate a bidirectional dependency parser for LTAG-spinal, in order to overcome the limitation of left-to-right processing. We propose a novel algorithm for graph-based incremental construction, and apply this algorithm to LTAG style dependency parsing. (b) We also explore learning algorithms for parse reranking, as well as other NLP problems, e.g. Machine Translation. We propose a novel reranking strategy, Ordinal Regression with Uneven Margins (ORUM), which achieves state-of-the-art performance on parse reranking for CFG parsing and MT reranking. To sum up, we have accomplished the following achievements. (i) A new formalism, LTAG-spinal, which is weakly equivalent to LTAG. (ii) An LTAG-spinal Treebank extracted from the PTB with the Propbank annotation. (iii) A left-to-right incremental parser for LTAG-spinal. (iv) A bidirectional LTAG-spinal dependency parser. (v) A novel graph-based incremental construction algorithm, which could be applied to many structure prediction problem in NLP, e.g. semantic role labeling. (vi) A novel discriminative reranking algorithm, ORUM, which has been successfully applied to parse reranking as well as other tasks, e.g. MT reranking.

Subject Area

Computer science

Recommended Citation

Shen, Libin, "Statistical LTAG parsing" (2006). Dissertations available from ProQuest. AAI3225543.