Departmental Papers (CIS)

Date of this Version


Document Type

Conference Paper


Gupta, S., Nenkova, A., & Jurafsky, D., Measuring Importance and Query Relevance in Topic-Focused Multi-Document Summarization, 45th Annual Meeting of the Association for Computational Linguistics, June 2007, doi: anthology-new/P


The increasing complexity of summarization systems makes it difficult to analyze exactly which modules make a difference in performance. We carried out a principled comparison between the two most commonly used schemes for assigning importance to words in the context of query focused multi-document summarization: raw frequency (word probability) and log-likelihood ratio. We demonstrate that the advantages of log-likelihood ratio come from its known distributional properties which allow for the identification of a set of words that in its entirety defines the aboutness of the input. We also find that LLR is more suitable for query-focused summarization since, unlike raw frequency, it is more sensitive to the integration of the information need defined by the user.



Date Posted: 31 July 2012