General Versus Specific Sentences: Automatic Identification and Application to Analysis of News Summaries

Louis, Annie; Nenkova, Ani

General Versus Specific Sentences: Automatic Identification and Application to Analysis of News Summaries

Files

MS_CIS_11_07.pdf (150.58 KB)

Penn collection

Technical Reports (CIS)

Permalink

https://repository.upenn.edu/handle/20.500.14332/7921

View all metadata

Author

Louis, Annie

Nenkova, Ani

Abstract

In this paper, we introduce the task of identifying general and specific sentences in news articles. Instead of embarking on a new annotation effort to obtain data for the task, we explore the possibility of leveraging existing large corpora annotated with discourse information to train a classifier. We introduce several classes of features that capture lexical and syntactic information, as well as word specificity and polarity. We then use the classifier to analyze the distribution of general and specific sentences in human and machine summaries of news articles. We discover that while all types of summaries tend to be more specific than the original documents, human abstracts contain a more balanced mix of general and specific sentences but automatic summaries are overwhelmingly specific. Our findings give strong evidence for the need for a new task in (abstractive) summarization: identification and generation of general sentences.

Publication date

2011-01-01

Comments

University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-11-07.

Collection

Reports