IRCS Technical Reports Series
Title
Automatic Induction of Rules for Text Simplification
Document Type
Technical Report
Date of this Version
December 1996
Abstract
Long and complicated sentences pose various problems to many state-of-the-art natural language technologies. We have been exploring methods to automatically transform such sentences as to make them simpler. These methods involve the use of a rule-based system, driven by the syntax of the text in the domain of interest. Hand-crafting rules for every domain is time-consuming and impractical. This paper describes an algorithm and an implementation by which generalized rules for simplification are automatically induced from annotated training material with a novel partial parsing technique which combines constituent structure and dependency information. This algorithm described in the paper employs example-based generalizations on linguistically-motivated structures.
Date Posted: 13 September 2006

Comments
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-96-30.