Towards A Practically Useful Text Simplification System

Reno Joseph Kriz, University of Pennsylvania

Abstract

While there is a vast amount of text written about nearly any topic, this is often difficult for someone unfamiliar with a specific field to understand. Automated text simplification aims to reduce the complexity of a document, making it more comprehensible to a broader audience. Much of the research in this field has traditionally focused on simplification sub-tasks, such as lexical, syntactic, or sentence-level simplification. However, current systems struggle to consistently produce high-quality simplifications. Phrase-based models tend to make too many poor transformations; on the other hand, recent neural models, while producing grammatical output, often do not make all needed changes to the original text. In this thesis, I discuss novel approaches for improving lexical and sentence-level simplification systems. Regarding sentence simplification models, after noting that encouraging diversity at inference time leads to significant improvements, I take a closer look at the idea of diversity and perform an exhaustive comparison of diverse decoding techniques on other generation tasks. I also discuss the limitations in the framing of current simplification tasks, which prevent these models from yet being practically useful. Thus, I also propose a retrieval-based reformulation of the problem. Specifically, starting with a document, I identify concepts critical to understanding its content, and then retrieve documents relevant for each concept, re-ranking them based on the desired complexity level.