A Computational Analysis of Information Structure Using Parallel Expository Texts in English and Japanese
Files
Degree type
Graduate group
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
This thesis concerns the notion of 'information structure': informally, organization of information in an utterance with respect to the context. Information structure has been recognized as a critical element in a number of computer applications: e.g., selection of contextually appropriate forms in machine translation and speech generation, and analysis of text readability in computer-assisted writing systems. One of the problems involved in these applications is how to identify information structure in extended texts. This problem is often ignored, assumed to be trivial, or reduced to a sub-problem that does not correspond to the complexity of realistic texts. A handful of computational proposals face the problem directly, but they are generally limited in coverage and all suffer from lack of evaluation. To fully demonstrate the usefulness of information structure, it is essential to apply a theory of information structure to the identification problem and to provide an evaluation method. This thesis adopts a classic theory of information structure as binomial partition between theme and rheme, and captures the property of theme as a requirement of the contextual-link status. The notion of 'contextual link' is further specified in terms of discourse status, domain-specific knowledge, and linguistic marking. The relation between theme and rheme is identified as the semantic composition of the two, and linked to surface syntactic structure using Combinatory Categorial Grammar. The identification process can then be specified as analysis of contextual link status along the linguistic structure. The implemented system identifies information structure in real texts in English. Building on the analysis of Japanese presented in the thesis, the system automatically predicts contextually appropriate use of certain particles in the corresponding texts in Japanese. The machine prediction is then compared with human translations. The evaluation results demonstrate that the prediction of the theory is an improvement over alternative hypotheses. We then conclude that information structure can in fact be used to improve the quality of computational applications in practical settings.