Pereira, Fernando C.N.
Email Address
ORCID
Disciplines
3 results
Search Results
Now showing 1 - 3 of 3
Publication Hierarchical Distributed Representations for Statistical Language Modeling(2004-12-13) Blitzer, John; Saul, Lawrence K; Weinberger, Kilian Q; Pereira, Fernando C.N.Statistical language models estimate the probability of a word occurring in a given context. The most common language models rely on a discrete enumeration of predictive contexts (e.g., n-grams) and consequently fail to capture and exploit statistical regularities across these contexts. In this paper, we show how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model. The representations are initialized by unsupervised algorithms for linear and nonlinear dimensionality reduction [14], then fed as input into a hierarchical mixture of experts, where each expert is a multinomial distribution over predicted words [12]. While the distributed representations in our model are inspired by the neural probabilistic language model of Bengio et al. [2, 3], our particular architecture enables us to work with significantly larger vocabularies and training corpora. For example, on a large-scale bigram modeling task involving a sixty thousand word vocabulary and a training corpus of three million sentences, we demonstrate consistent improvement over class-based bigram models [10, 13]. We also discuss extensions of our approach to longer multiword contexts.Publication Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data(2001-06-28) Lafferty, John; McCallum, Andrew; Pereira, Fernando C.N.We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.Publication Case-Factor Diagrams for Structured Probabilistic Modeling(2004-07-07) McAllester, David; Collins, Michael; Pereira, Fernando C.N.We introduce a probabilistic formalism subsuming Markov random fields of bounded tree width and probabilistic context free grammars. Our models are based on a representation of Boolean formulas that we call case-factor diagrams (CFDs). CFDs are similar to binary decision diagrams (BDDs) but are concise for circuits of bounded tree width (unlike BDDs) and can concisely represent the set of parse trees over a given string under a given context free grammar (also unlike BDDs). A probabilistic model consists of a CFD defining a feasible set of Boolean assignments and a weight (or cost) for each individual Boolean variable. We give an inside-outside algorithm for simultaneously computing the marginal of each Boolean variable, and a Viterbi algorithm for finding the mininum cost variable assignment. Both algorithms run in time proportional to the size of the CFD.