Technical Reports (CIS)

Document Type

Technical Report

Date of this Version

June 2008


University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-08-24.


We present a corpus study of local discourse relations based on the Penn Discourse Tree Bank, a large manually annotated corpus of explicitly or implicitly realized contingency, comparison, temporal and expansion relations. We show that while there is a large degree of ambiguity in temporal explicit discourse connectives, overall discourse connectives are mostly unambiguous and allow high accuracy classification of discourse relations. We achieve 93.09% accuracy in classifying the explicit relations and 74.74% accuracy overall. In addition, we show that some pairs of relations occur together in text more often than expected by chance. This finding suggest that global sequence classification of the relations in text can lead to better results, especially for implicit relations.



Date Posted: 16 June 2008