Maximum Entropy Models For Natural Language Ambiguity Resolution

Thumbnail Image
Degree type
Graduate group
IRCS Technical Reports Series
Grant number
Copyright date
Related resources
Ratnaparkhi, Adwait

This thesis demonstrates that several important kinds of natural language ambiguities can be resolved to state-of-the-art accuracies using a single statistical modeling technique based on the principle of maximum entropy. We discuss the problems of sentence boundary detection, part-of-speech tagging, prepositional phrase attachment, natural language parsing, and text categorization under the maximum entropy framework. In practice, we have found that maximum entropy models offer the following advantages: State-of-the-art Accuracy: The probability models for all of the tasks discussed perform at or near state-of-the-art accuracies, or outperform competing learning algorithms when trained and tested under similar conditions. Methods which outperform those presented here require much more supervision in the form of additional human involvement or additional supporting resources. Knowledge-Poor Features: The facts used to model the data, or features, are linguistically very simple, or "knowledge-poor" but yet succeed in approximating complex linguistic relationships. Reusable Software Technology: The mathematics of the maximum entropy framework are essentially independent of any particular task, and a single software implementation can be used for all of the probability models in this thesis. The experiments in this thesis suggest that experimenters can obtain state-of-the-art accuracies on a wide range of natural language tasks, with little task-specific effort, by using maximum entropy probability models.

Date of degree
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher DOI
Journal Issue
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-98-15.
Recommended citation