Nenkova, Ani

Email Address
ORCID
Disciplines
Research Projects
Organizational Units
Position
Introduction
Research Interests

Search Results

Now showing 1 - 5 of 5
  • Publication
    General Versus Specific Sentences: Automatic Identification and Application to Analysis of News Summaries
    (2011-01-01) Louis, Annie; Nenkova, Ani
    In this paper, we introduce the task of identifying general and specific sentences in news articles. Instead of embarking on a new annotation effort to obtain data for the task, we explore the possibility of leveraging existing large corpora annotated with discourse information to train a classifier. We introduce several classes of features that capture lexical and syntactic information, as well as word specificity and polarity. We then use the classifier to analyze the distribution of general and specific sentences in human and machine summaries of news articles. We discover that while all types of summaries tend to be more specific than the original documents, human abstracts contain a more balanced mix of general and specific sentences but automatic summaries are overwhelmingly specific. Our findings give strong evidence for the need for a new task in (abstractive) summarization: identification and generation of general sentences.
  • Publication
    Easily Identifiable Discourse Relations
    (2008-06-16) Pitler, Emily; Raghupathy, Mridhula; Mehta, Hena; Nenkova, Ani; Lee, Alan; Joshi, Aravind K
    We present a corpus study of local discourse relations based on the Penn Discourse Tree Bank, a large manually annotated corpus of explicitly or implicitly realized contingency, comparison, temporal and expansion relations. We show that while there is a large degree of ambiguity in temporal explicit discourse connectives, overall discourse connectives are mostly unambiguous and allow high accuracy classification of discourse relations. We achieve 93.09% accuracy in classifying the explicit relations and 74.74% accuracy overall. In addition, we show that some pairs of relations occur together in text more often than expected by chance. This finding suggest that global sequence classification of the relations in text can lead to better results, especially for implicit relations.
  • Publication
    Improving the Estimation of Word Importance for News Multi-Document Summarization - Extended Technical Report
    (2014-02-03) Hong, Kai; Nenkova, Ani
    In this paper, we propose a supervised model for ranking word importance that incorporates a rich set of features. Our model is superior to prior approaches for identifying words used in human summaries. Moreover we show that an extractive summarizer which includes our estimation of word importance results in summaries comparable with the state-of-the-art by automatic evaluation.
  • Publication
    Modelling Prominence and Emphasis Improves Unit-Selection Synthesis
    (2007-08-01) Strom, Volker; Nenkova, Ani; Clark, Robert; Vazquez-Alvarez, Yolanda; Brenier, Jason; King, Simon; Jurafsky, Dan
    We describe the results of large scale perception experiments showing improvements in synthesising two distinct kinds of prominence: standard pitch-accent and strong emphatic accents. Previously prominence assignment has been mainly evaluated by computing accuracy on a prominence-labelled test set. By contrast we integrated an automatic pitch-accent classifier into the unit selection target cost and showed that listeners preferred these synthesised sentences. We also describe an improved recording script for collecting emphatic accents, and show that generating emphatic accents leads to further improvements in the fiction genre over incorporating pitch accent only. Finally, we show differences in the effects of prominence between child-directed speech and news and fiction genres.
  • Publication
    Automatic Detection of Contrastive Elements in Spontaneous Speech
    (2007-12-01) Nenkova, Ani; Jurafsky, Dan
    In natural speech people use different levels of prominence to signal which parts of an utterance are especially important. Contrastive elements are often produced with stronger than usual prominence and their presence modifies the meaning of the utterance in subtle but important ways. We use a richly annotated corpus of conversational speech to study the acoustic characteristics of contrastive elements and the differences between them and words at other levels of prominence. We report our results for automatic detection of contrastive elements based on acoustic and textual features, finding that a baseline predicting nouns and adjectives as contrastive performs on par with the best combination of features. We achieve a much better performance in a modified task of detecting contrastive elements among words that are predicted to bear pitch accent.