IRCS Technical Reports Series
Date of this Version
Long and complicated sentences pose various problems to many state-of-the-art natural language technologies. We have been exploring methods to automatically transform such sentences as to make them simpler. These methods involve the use of a rule-based system, driven by the syntax of the text in the domain of interest. Hand-crafting rules for every domain is time-consuming and impractical. This paper describes an algorithm and an implementation by which generalized rules for simplification are automatically induced from annotated training material with a novel partial parsing technique which combines constituent structure and dependency information. This algorithm described in the paper employs example-based generalizations on linguistically-motivated structures.
Date Posted: 13 September 2006
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-96-30.