An Alternative Prior Process for Nonparametric Bayesian Clustering

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
Computer Sciences
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Wallach, Hanna M
Jensen, Shane T
Dicker, Lee
Heller, Katherine A
Contributor
Abstract

Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive prob- abilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering-the uniform process-for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.

Advisor
Date of presentation
2010-01-01
Conference name
Statistics Papers
Conference dates
2023-05-17T15:26:32.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection