Taskar, Ben

Email Address
ORCID
Disciplines
Research Projects
Organizational Units
Position
Introduction
Research Interests

Search Results

Now showing 1 - 10 of 30
  • Publication
    Learning Determinantal Point Processes
    (2011-07-01) Taskar, Ben; Kulesza, Alex
    Determinantal point processes (DPPs), which arise in random matrix theory and quantum physics, are natural models for subset selection problems where diversity is preferred. Among many remarkable properties, DPPs other tractable algorithms for exact inference, including computing marginal probabilities and sampling; how- ever, an important open question has been how to learn a DPP from labeled training data. In this paper we propose a natural feature-based parameterization of conditional DPPs, and show how it leads to a convex and efficient learning formulation. We analyze the relationship between our model and binary Markov random fields with repulsive potentials, which are qualitatively similar but computationally intractable. Finally, we apply our approach to the task of extractive summarization, where the goal is to choose a small subset of sentences conveying the most important information from a set of documents. In this task there is a fundamental tradeoff between sentences that are highly relevant to the collection as a whole, and sentences that are diverse and not repetitive. Our parameterization allows us to naturally balance these two characteristics. We evaluate our system on data from the DUC 2003/04 multi- document summarization task, achieving state-of-the-art results.
  • Publication
    Posterior Sparsity in Unsupervised Dependency Parsing
    (2010-01-01) Gillenwater, Jennifer; Ganchev, Kuzman; Graça, João; Pereira, Fernando; Taskar, Ben
    A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed accuracy over the standard expectation maximization (EM) baseline for 9 of the languages, with an average accuracy improvement of 6%. Further, we show that for 8 out of 12 languages, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors, with an average improvement of 4%. On English text in particular, we show that our approach improves performance over other state of the art techniques.
  • Publication
    A permutation-augmented sampler for DP mixture models
    (2007-06-01) Taskar, Ben; Liang, Percy; Jordan, Michael
    We introduce a new inference algorithm for Dirichlet process mixture models. While Gibbs sampling and variational methods focus on local moves, the new algorithm makes more global moves. This is done by introducing a permutation of the data points as an auxiliary variable. The algorithm is a blocked sampler which alternates between sampling the clustering and sampling the permutation. The key to the efficiency of this approach is that it is possible to use dynamic programming to consider all exponentially many clusterings consistent with a given permutation. We also show that random projections can be used to effectively sample the permutation. The result is a stochastic hill-climbing algorithm that yields burn-in times significantly smaller than those of collapsed Gibbs sampling.
  • Publication
    Sparsity in Dependency Grammar Induction
    (2010-07-01) Taskar, Ben; Pereira, Fernando CN; Graca, Joao V; Gillenwater, Jennifer; Ganchev, Kuzman
    A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.
  • Publication
    Learning Sparse Markov Network Structure via Ensemble-of-Trees Models
    (2009-04-01) Taskar, Ben; Lin, Yuanqing; Zhu, Shenghuo; Lee, Daniel
    Learning the sparse structure of a general Markov network is a hard computational problem. One of the main difficulties is the computation of the generally intractable partition function. To circumvent this difficulty, we propose to learn the network structure using an ensemble-of- trees (ET) model. The ET model was first introduced by Meil˘a and Jaakkola (2006), and it represents a multivariate distribution using a mixture of all possible spanning trees. The advantage of the ET model is that, although it needs to sum over super-exponentially many trees, its partition function as well as data likelihood can be computed in a closed form. Furthermore, because the ET model tends to represent a Markov network using as small number of trees as possible, it provides a natural regularization for finding a sparse network structure. Our simulation results show that the proposed ET approach is able to accurately recover the true Markov network connectivity and outperform the state-of-art approaches for both discrete and continuous random variable network swhen a small number of data samples is available. Furthermore, we also demonstrate the usage of the ET model for discovering the network of words from blog posts.
  • Publication
    Expectation Maximization and Posterior Constraints
    (2007-12-01) Graca, Joao V; Ganchev, Kuzman; Taskar, Ben
    The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models.
  • Publication
    Talking Pictures: Temporal Grouping and Dialog-Supervised Person Recognition
    (2010-06-01) Cour, Timothee; Sapp, Benjamin; Nagle, Akash; Taskar, Ben
    We address the character identification problem in movies and television videos: assigning names to faces on the screen. Most prior work on person recognition in video assumes some supervised data such as screenplay or handlabeled faces. In this paper, our only source of ‘supervision’ are the dialog cues: first, second and third person references (such as “I’m Jack”, “Hey, Jack!” and “Jack left”). While this kind of supervision is sparse and indirect, we exploit multiple modalities and their interactions (appearance, dialog, mouth movement, synchrony, continuityediting cues) to effectively resolve identities through local temporal grouping followed by global weakly supervised recognition. We propose a novel temporal grouping model that partitions face tracks across multiple shots while respecting appearance, geometric and film-editing cues and constraints. In this model, states represent partitions of the k most recent face tracks, and transitions represent compatibility of consecutive partitions. We present dynamic programming inference and discriminative learning for the model. The individual face tracks are subsequently assigned a name by learning a classifier from partial label constraints. The weakly supervised classifier incorporates multiple-instance constraints from dialog cues as well as soft grouping constraints from our temporal grouping. We evaluate both the temporal grouping and final character naming on several hours of TV and movies.
  • Publication
    Sidestepping Intractable Inference with Structured Ensemble Cascades
    (2010-12-01) Weiss, David; Sapp, Benjamin; Taskar, Ben
    For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem.
  • Publication
    Sparsity in Dependency Grammar Induction
    (2010-07-01) Gillenwater, Jennifer; Ganchev, Kuzman; Graca, Joao V; Pereira, Fernando; Taskar, Ben
    A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.
  • Publication
    Alignment by Agreement
    (2006-06-01) Liang, Percy; Taskar, Ben; Klein, Dan
    We present an unsupervised approach to symmetric word alignment in which two simple asymmetric models are trained jointly to maximize a combination of data likelihood and agreement between the models. Compared to the standard practice of intersecting predictions of independently-trained models, joint training provides a 32% reduction in AER. Moreover, a simple and efficient pair of HMM aligners provides a 29% reduction in AER over symmetrized IBM model 4 predictions.