Measuring Importance and Query Relevance in Toopic-Focused Multi-Document Summarization

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
Computer Sciences
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Gupta, Surabhi
Jurafsky, Dan
Contributor
Abstract

The increasing complexity of summarization systems makes it difficult to analyze exactly which modules make a difference in performance. We carried out a principled comparison between the two most commonly used schemes for assigning importance to words in the context of query focused multi-document summarization: raw frequency (word probability) and log-likelihood ratio. We demonstrate that the advantages of log-likelihood ratio come from its known distributional properties which allow for the identification of a set of words that in its entirety defines the aboutness of the input. We also find that LLR is more suitable for query-focused summarization since, unlike raw frequency, it is more sensitive to the integration of the information need defined by the user.

Advisor
Date of presentation
2007-06-01
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-17T07:17:25.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Gupta, S., Nenkova, A., & Jurafsky, D., Measuring Importance and Query Relevance in Topic-Focused Multi-Document Summarization, 45th Annual Meeting of the Association for Computational Linguistics, June 2007, doi: http://aclweb.org/anthology-new/P/P07/P07-2049.pdf
Recommended citation
Collection