American Sign Language recognition: Reducing the complexity of the task with phoneme-based modeling and parallel hidden Markov models
In this thesis I present a framework for recognizing American Sign Language (ASL) from 3D data. The goal is to develop approaches that will scale well with increasing vocabulary sizes. Scalability is a major concern, because the computational treatment of ASL is a very complex undertaking. Two points particularly stand out: First, ASL is a highly inflected language, resulting in too many appearances of inflectional variants to model them all separately. Second, in ASL events occur both sequentially and simultaneously. Unlike speech recognition, ASL recognition cannot consider all possible combinations of simultaneous events explicitly, because of their sheer number. As a result, the computational treatment of ASL is much more complex than the computational treatment of spoken languages. Reducing the complexity of the task requires a two-pronged approach, which encompasses work on both the modeling and the computational sides. On the modeling side, I tackle the many appearances by breaking the signs down into their constituent phonemes, which are limited in number. I use the Movement-Hold phonological model for ASL as a guideline, and extend the parts of it that are not directly applicable to recognition systems. In addition, I recast it to describe simultaneous events in independent channels, so that it is no longer necessary to consider all their possible combinations. The result is a significant reduction of the modeling complexity. On the recognition side, I pose parallel hidden Markov models (PaHMMs) as an extension to conventional hidden Markov models. I develop a PaHMM recognition algorithm specifically geared toward the properties of sign languages. PaHMMs are the computational counterpart to modeling simultaneous events in independent channels, and allow putting them together on the fly at recognition time, instead of having to consider them a-priori. I validate the modeling approach and the PaHMM recognition algorithm in a pilot study with experiments on 53-sign and 22-sign data sets. In the PaHMM experiments, the independent channels consist of the hand movements of both hands, and the handshape of the strong hand. The results demonstrate the viability of both the phoneme modeling and the description of simultaneous events in independent channels.
Vogler, Christian Philipp, "American Sign Language recognition: Reducing the complexity of the task with phoneme-based modeling and parallel hidden Markov models" (2003). Dissertations available from ProQuest. AAI3087476.