Tractable models of natural language semantics for recognizing spoken directions
The development of speaker-independent mixed-initiative spoken language interfaces, in which users not only answer questions but also ask questions and give instructions, is currently limited by the performance of language models based largely on word co-occurrences. Even under ideal circumstances, with large application-specific corpora on which to train, conventional language models are not sufficiently predictive to correctly analyze a wide variety of inputs from a wide variety of speakers, such as might be encountered in a general-purpose interface for directing robots, office assistants, or other agents with complex capabilities. This thesis explores the use of statistical models of language conditioned on the meanings or denotations of input utterances in the context of an interface's underlying application environment or world model, as an extension to the ‘semantic grammars’ used in existing spoken language interfaces (which rely on co-occurrences among words or word classes). Since there are an exponential number of possible parse tree analyses attributable to any string of words, and many possible word strings attributable to any utterance, this use of model-theoretic interpretation must involve some kind of sharing of partial results between competing analyses if interpretation is to be performed on large numbers of possible analyses in a practical interactive application. This thesis presents a formal result that model-theoretic semantic interpretation can be factored (cut into well-behaved partial results) and shared (re-used between possible analyses) in polynomial time, in much the same way that simple syntactic structure is factored into context-free rules and shared in standard dynamic programming parsing algorithms. This polynomial bound holds even for analyses containing non-immediate variable scopings (including intra-sentential anaphora and quantifier raising) and generalized quantifiers, which are traditionally analyzed to have second-order (exponential) denotations. The thesis also presents the practical result that this approach does indeed yield a statistically significant improvement in accuracy in analyzing a corpus of spoken directions to 3-D animated agents.
Computer science|Linguistics|Artificial intelligence
Schuler, William Edward, "Tractable models of natural language semantics for recognizing spoken directions" (2003). Dissertations available from ProQuest. AAI3095935.