Methods For Text Summarization Evaluation

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Computer and Information Science
Discipline
Subject
evaluation
evaluation metrics
question-answering
summarization
Artificial Intelligence and Robotics
Funder
Grant number
License
Copyright date
2022-10-05T20:22:00-07:00
Distributor
Related resources
Author
Deutsch, Daniel
Contributor
Abstract

The ability to effectively evaluate a learned model is a critical component of machine learning research; without it, progress on tasks cannot be measured and is thus impossible. In the natural language processing task of text summarization, evaluation is incredibly difficult: the notion of the "perfect" summary content is ill-defined, but even if it could be defined, that content can be expressed in many different ways, making it difficult to identify in a summary. The evaluation metrics that researchers propose for text summarization must overcome these challenges in some way. In this thesis, I identify problems with the existing methodologies for evaluating summaries as well as meta-evaluating the quality of an evaluation metric and propose solutions for improving them. I demonstrate that commonly used evaluation metrics fail to properly evaluate the information content of summaries and propose an evaluation metric based on question-answering to address the shortcomings of existing metrics. Then, I argue that the class of metrics which attempt to evaluate the quality of a summary's content without the aid of a human-written reference is inherently biased and limited in its ability to evaluate summaries. Finally, I identify that the methodology for quantifying how well an automatic metric agrees with human judgments of summary quality fails to provide a complete understanding of a metric's performance. To that end, I propose new statistical analysis tools to address the limitations of the standard meta-evaluation procedure and provide a new protocol for meta-evaluating metrics that better evaluates metrics in realistic use cases.

Advisor
Dan Roth
Date of degree
2022-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation