Date of this Version
The increasing complexity of summarization systems makes it difficult to analyze exactly which modules make a difference in performance. We carried out a principled comparison between the two most commonly used schemes for assigning importance to words in the context of query focused multi-document summarization: raw frequency (word probability) and log-likelihood ratio. We demonstrate that the advantages of log-likelihood ratio come from its known distributional properties which allow for the identification of a set of words that in its entirety defines the aboutness of the input. We also find that LLR is more suitable for query-focused summarization since, unlike raw frequency, it is more sensitive to the integration of the information need defined by the user.
Surabhi Gupta, Ani Nenkova, and Dan Jurafsky, "Measuring Importance and Query Relevance in Toopic-Focused Multi-Document Summarization", . June 2007.
Date Posted: 31 July 2012