Robust semantic role labeling using parsing variations and semantic classes

Szu-ting Yi, University of Pennsylvania


Correctly identifying semantic entities and successfully disambiguating the relations between them and their predicates is an important and necessary step for successful natural language processing applications, such as text summarization, question answering, and machine translation. Researchers have studied this problem, semantic role labeling (SRL), as a machine learning problem since 2000, after large-scale corpora annotated for arguments for a broad range of predicates became available. However, after using an optimal global inference algorithm to combine several SRL systems, the growth of SRL performance seems to have reached a plateau. SRL systems typical rely on an upstream syntactic parser to gather argument candidates. We believe that this one-way relationship is the bottleneck of semantic role labeling, and we attempt to tackle it by training parsers more suitable for the SRL task. We incorporated semantic role annotation directly into the parse tree annotation, and trained different types of parsers on this data. We found that our maximum entropy style parser (Ratnaparkhi, 1999) derived more benefit from the additional features than our Collins style parser, based on Dan Bikel's implementation (Collins, 1999; Bikel, 2004). It also demonstrated better adaptability when ported to a different genre (the Brown corpus), outscoring the Collins style parser on Brown SRL by 10%. A thorough error analysis indicated that a better route to creating a suitable syntactic parser for the task of semantic role labeling is to create training data which is more consistent and less contradictory. We then carefully examined different types of Treebank/PropBank mismatches and both Treebank and PropBank made changes in order to reach synchronization. The preliminary assessment on the merged data by comparing SRL performance on the old 300k and new 300k data indicate that the noisy data problem might still exist because the synchronization is not yet complete. In order to achieve system robustness, we create a new set of semantic roles by transforming verb-specific PropBank roles to less verb-dependent thematic roles based on the mapping between PropBank and VerbNet. Our hypothesis is that a set of less verb-dependent roles should be easier to learn and port better to different genres. We compared SRL system performance trained on different sets of semantic roles, and the results confirm the hypothesis. The new system ports better to novel text. On a subtask of comparing one overloaded PropBank role to its mapped thematic roles, the new system trained on the WSJ corpus gains a 6% performance improvement on the test set extracted from WSJ, and a 10% performance improvement on the new genres from the Brown corpus. Syntactic parsing is the bottleneck of the task of semantic role labeling and robustness is the ultimate goal. In this thesis, we investigate ways to train a better syntactic parser and increase SRL system robustness. We demonstrate that parse trees augmented by semantic role markups can serve as suitable training data for training a parser for an SRL system. Furthermore, we show that by resolving the discrepancies between Penn Treebank and PropBank, it is possible to create a cleaner training corpus both for training the parsers and the SRL systems. For system robustness, we propose that it is easier to learn a new set of semantic roles transformed from the original argument roles based on the mapping between VerbNet and PropBank. The new roles are less verb-dependent than the original PropBank roles. As a result, the SRL system trained on the new roles achieves significantly better robustness than the original system.

Subject Area

Artificial intelligence|Computer science

Recommended Citation

Yi, Szu-ting, "Robust semantic role labeling using parsing variations and semantic classes" (2007). Dissertations available from ProQuest. AAI3271840.