Date of this Version
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive prob- abilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering-the uniform process-for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.
Wallach, H. M., Jensen, S. T., Dicker, L., & Heller, K. A. (2010). An Alternative Prior Process for Nonparametric Bayesian Clustering. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 9 892-899. Retrieved from https://repository.upenn.edu/statistics_papers/144
Date Posted: 27 November 2017
This document has been peer reviewed.