On the parameter space of generative lexicalized statistical parsing models

Daniel M Bikel, University of Pennsylvania


In this thesis, we apply as well as develop techniques and methodologies for the examination of the complex systems that are lexicalized statistical parsing models. The primary idea is that of treating the “model as data”, which is not a particular method, but a paradigm and a research methodology. Our argument is that lexicalized statistical parsing models have become increasingly complex, and therefore require thorough scrutiny, both to achieve the scientific aim of understanding what has been built thus far, and to achieve both the scientific and engineering goal of using that understanding for progress. In this thesis, we take a particular, dominant type of parsing model and perform a macro analysis, to reveal its core (and design a software engine that modularizes the periphery), and we also crucially perform a detailed analysis, which provides for the first time a window onto the efficacy of specific parameters. These analyses have not only yielded insight into the core model, but they have also enabled the identification of “inefficiencies” in our baseline model, such that those inefficiencies can be reduced to form a more compact model, or exploited for finding a better-estimated model with higher accuracy, or both.

Subject Area

Computer science|Artificial intelligence

Recommended Citation

Bikel, Daniel M, "On the parameter space of generative lexicalized statistical parsing models" (2004). Dissertations available from ProQuest. AAI3152016.