Methods For Text Summarization Evaluation

Deutsch, Daniel

Methods For Text Summarization Evaluation

Files

Deutsch_upenngdas_0175C_15195.pdf (3.06 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Subject

evaluation
evaluation metrics
question-answering
summarization
Artificial Intelligence and Robotics

Copyright date

2022-10-05T20:22:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/32097

View all metadata

Author

Deutsch, Daniel

Abstract

The ability to effectively evaluate a learned model is a critical component of machine learning research; without it, progress on tasks cannot be measured and is thus impossible. In the natural language processing task of text summarization, evaluation is incredibly difficult: the notion of the "perfect" summary content is ill-defined, but even if it could be defined, that content can be expressed in many different ways, making it difficult to identify in a summary. The evaluation metrics that researchers propose for text summarization must overcome these challenges in some way. In this thesis, I identify problems with the existing methodologies for evaluating summaries as well as meta-evaluating the quality of an evaluation metric and propose solutions for improving them. I demonstrate that commonly used evaluation metrics fail to properly evaluate the information content of summaries and propose an evaluation metric based on question-answering to address the shortcomings of existing metrics. Then, I argue that the class of metrics which attempt to evaluate the quality of a summary's content without the aid of a human-written reference is inherently biased and limited in its ability to evaluate summaries. Finally, I identify that the methodology for quantifying how well an automatic metric agrees with human judgments of summary quality fails to provide a complete understanding of a metric's performance. To that end, I propose new statistical analysis tools to address the limitations of the standard meta-evaluation procedure and provide a new protocol for meta-evaluating metrics that better evaluates metrics in realistic use cases.

Advisor

Dan Roth

Date of degree

2022-01-01

Collection

Dissertations and Theses