Hierarchical Distributed Representations for Statistical Language Modeling

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Blitzer, John
Weinberger, Kilian Q
Contributor
Abstract

Statistical language models estimate the probability of a word occurring in a given context. The most common language models rely on a discrete enumeration of predictive contexts (e.g., n-grams) and consequently fail to capture and exploit statistical regularities across these contexts. In this paper, we show how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model. The representations are initialized by unsupervised algorithms for linear and nonlinear dimensionality reduction [14], then fed as input into a hierarchical mixture of experts, where each expert is a multinomial distribution over predicted words [12]. While the distributed representations in our model are inspired by the neural probabilistic language model of Bengio et al. [2, 3], our particular architecture enables us to work with significantly larger vocabularies and training corpora. For example, on a large-scale bigram modeling task involving a sixty thousand word vocabulary and a training corpus of three million sentences, we demonstrate consistent improvement over class-based bigram models [10, 13]. We also discuss extensions of our approach to longer multiword contexts.

Advisor
Date of presentation
2004-12-13
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-16T22:31:34.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Copyright MIT Press. Postprint version. Published in Advances in Neural Information Processing Systems 17, pages 185-192. Proceedings of the 18th annual Neural Information Processing Systems (NIPS) conference, held in Vancouver, Canada, from 13-18 December 2004.
Copyright MIT Press. Postprint version. Published in Advances in Neural Information Processing Systems 17, December 2003.
Recommended citation
Collection