Pronominal anaphora resolution in Chinese
Resolving pronominal anaphors in English has been a focus of research in natural language processing for decades. Methods ranging from linguistics-oriented, rule-based approaches to data-oriented, machine-learning approaches have been applied to the problem of finding the antecedents of pronouns. In contrast to the abundance of research in English, there is almost no work on the problem in Chinese. This thesis addresses that gap. Both a rule-based and a machine-learning anaphora resolution approach are presented in this work. An important difference between Chinese and English is that Chinese, unlike English, is a pro-drop language, and has null (zero) pronouns. The rule-based approach is applied to resolving these null pronouns as well as to the overt, third-person pronouns. The Hobbs algorithm is used for the rule-based method of anaphora resolution. Three versions of the algorithm are presented. The first uses only syntactic structure to select an antecedent. The second uses limited number and gender agreement, while the third incorporates semantic constraints on the proposed antecedents. For the machine-learning method, maximum entropy, supervised machine-learning models are used. Different models were trained using sets of features that paralleled the information sources used by the different versions of the Hobbs algorithm. Two sets of data were used. The Penn Chinese Treebank provided the test data for resolution of both overt, third-person pronouns and of zero pronouns. The CTB parses were annotated for coreference using guidelines that were drawn up for the work presented here. Data annotated for the 2004 Chinese ACE program were used for training and testing the maximum entropy models to find the antecedents for overt, third-person pronouns. The results from experiments with the two basic methods using the different levels of linguistic information will be presented and discussed.
Converse, Susan P, "Pronominal anaphora resolution in Chinese" (2006). Dissertations available from ProQuest. AAI3222224.