A Computational Analysis of Information Structure Using Parallel Expository Texts in English and Japanese

Komagata, Nobo N.

A Computational Analysis of Information Structure Using Parallel Expository Texts in English and Japanese

Files

99_07.pdf (1.22 MB)

Permalink

https://repository.upenn.edu/handle/20.500.14332/37651

View all metadata

Author

Komagata, Nobo N.

Abstract

This thesis concerns the notion of 'information structure': informally, organization of information in an utterance with respect to the context. Information structure has been recognized as a critical element in a number of computer applications: e.g., selection of contextually appropriate forms in machine translation and speech generation, and analysis of text readability in computer-assisted writing systems. One of the problems involved in these applications is how to identify information structure in extended texts. This problem is often ignored, assumed to be trivial, or reduced to a sub-problem that does not correspond to the complexity of realistic texts. A handful of computational proposals face the problem directly, but they are generally limited in coverage and all suffer from lack of evaluation. To fully demonstrate the usefulness of information structure, it is essential to apply a theory of information structure to the identification problem and to provide an evaluation method. This thesis adopts a classic theory of information structure as binomial partition between theme and rheme, and captures the property of theme as a requirement of the contextual-link status. The notion of 'contextual link' is further specified in terms of discourse status, domain-specific knowledge, and linguistic marking. The relation between theme and rheme is identified as the semantic composition of the two, and linked to surface syntactic structure using Combinatory Categorial Grammar. The identification process can then be specified as analysis of contextual link status along the linguistic structure. The implemented system identifies information structure in real texts in English. Building on the analysis of Japanese presented in the thesis, the system automatically predicts contextually appropriate use of certain particles in the corresponding texts in Japanese. The machine prediction is then compared with human translations. The evaluation results demonstrate that the prediction of the theory is an improvement over alternative hypotheses. We then conclude that information structure can in fact be used to improve the quality of computational applications in practical settings.

Date of degree

1999

Comments

University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-99-07.

Collection

Dissertations and Theses