Towards A Practically Useful Text Simplification System

Kriz, Reno Joseph

Towards A Practically Useful Text Simplification System

Files

Kriz_upenngdas_0175C_14895.pdf (1.88 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Subject

information retrieval
lexical simplification
natural language processing
sentence simplification
text generation
text simplification
Artificial Intelligence and Robotics

Copyright date

2022-09-09T20:21:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/31587

View all metadata

Author

Kriz, Reno Joseph

Abstract

While there is a vast amount of text written about nearly any topic, this is often difficult for someone unfamiliar with a specific field to understand. Automated text simplification aims to reduce the complexity of a document, making it more comprehensible to a broader audience. Much of the research in this field has traditionally focused on simplification sub-tasks, such as lexical, syntactic, or sentence-level simplification. However, current systems struggle to consistently produce high-quality simplifications. Phrase-based models tend to make too many poor transformations; on the other hand, recent neural models, while producing grammatical output, often do not make all needed changes to the original text. In this thesis, I discuss novel approaches for improving lexical and sentence-level simplification systems. Regarding sentence simplification models, after noting that encouraging diversity at inference time leads to significant improvements, I take a closer look at the idea of diversity and perform an exhaustive comparison of diverse decoding techniques on other generation tasks. I also discuss the limitations in the framing of current simplification tasks, which prevent these models from yet being practically useful. Thus, I also propose a retrieval-based reformulation of the problem. Specifically, starting with a document, I identify concepts critical to understanding its content, and then retrieve documents relevant for each concept, re-ranking them based on the desired complexity level.

Advisor

Chris Callison-Burch
Marianna Apidianaki

Date of degree

2021-01-01

Collection

Dissertations and Theses