From Discourse Structure To Text Specificity: Studies Of Coherence Preferences

Li, Junyi

From Discourse Structure To Text Specificity: Studies Of Coherence Preferences

Files

Li_upenngdas_0175C_12972.pdf (1.39 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Subject

computational linguistics
discourse
natural language processing
specificity
Artificial Intelligence and Robotics

Copyright date

2018-02-23T20:17:00-08:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/29362

View all metadata

Author

Li, Junyi

Abstract

To successfully communicate through text, a writer needs to organize information into an understandable and well-structured discourse for the targeted audience. This involves deciding when to convey general statements, when to elaborate on details, and gauging how much details to convey, i.e., the level of specificity. This thesis explores the automatic prediction of text specificity, and whether the perception of specificity varies across different audiences. We characterize text specificity from two aspects: the instantiation discourse relation, and the specificity of sentences and words. We identify characteristics of instantiation that signify a change of specificity between sentences. Features derived from these characteristics substantially improve the detection of the relation. Using instantiation sentences as the basis for training, we propose a semi-supervised system to predict sentence specificity with speed and accuracy. Furthermore, we present insights into the effect of underspecified words and phrases on the comprehension of text, and the prediction of such words. We show distinct preferences in specificity and discourse structure among different audiences. We investigate these distinctions in both cross-lingual and monolingual context. Cross-lingually, we identify discourse factors that significantly impact the quality of text translated from Chinese to English. Notably, a large portion of Chinese sentences are significantly more specific and need to be translated into multiple English sentences. We introduce a system using rich syntactic features to accurately detect such sentences. We also show that simplified text is more general, and that specific sentences are more likely to need simplification. Finally, we present evidence that the perception of sentence specificity differs among male and female readers.

Advisor

Ani Nenkova
Mitchell P. Marcus

Date of degree

2017-01-01

Collection

Dissertations and Theses