Korean zero pronouns: Analysis and resolution

Na-Rae Han, University of Pennsylvania


Zero pronouns, or dropped arguments, are a remarkably frequent phenomenon in Korean. This single syntactic form is made up of diverse subcategories, each of which is characterized by distinct semantic and pragmatic properties. More widely acknowledged types are those that depend on other linguistic expressions for their reference: such text-dependent types include anaphoric and discourse-deictic zero pronouns. Other text-independent types are deictic zero pronouns, generic and specific indefinite zero pronouns and situational zero pronouns, although it is possible for some of these text-independent types to enter coreferential relations with other nominal expressions in their surroundings. Previous research focusing on anaphoric zero pronouns, most notably that based on Centering Theory, claims that information-theoretic notions such as saliency govern their felicitous use and interpretation. While the general insight holds true, various efforts to encapsulate it by way of precise formulation of Cf-ranking or other hierarchies fall short, largely due to the fact that the notion is encoded by heterogeneous linguistic factors whose relations cannot be expressed single-dimensionally. From a language processing point of view, the diverse nature of Korean zero pronouns presents the unique challenge of blending the tasks of categorization and identification of their antecedents. In this dissertation, using Maximum Entropy as the machine learning method of choice, various statistical models for Korean zero pronoun resolution have been successfully trained and tested on two Korean Treebank corpora. These Models serve as a valuable opportunity for empirically testing various theoretical claims and observations made on Korean zero pronoun anaphora. Features used in constructing the models and making predictions on zero pronoun reference encode linguistic properties surrounding zero pronouns and their potential antecedents. The features found to have a particularly strong contribution are indeed those that encode the linguistic aspects that are commonly cited in the linguistic literature as playing a crucial role in Korean zero pronoun usage, such as topic-hood, subject-hood and the nullness of form. While the relative importance of such features does not directly translate to linguistic hierarchies, it nevertheless provides support to some of the specific criteria used in them.

Linguistics|Computer science

Han, Na-Rae, "Korean zero pronouns: Analysis and resolution" (2006). Dissertations available from ProQuest. AAI3211080.