Visual Comparison of Datasets Using Mixture Decompositions

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
classification
data visualization
density estimation
exploratory data analysis
mixture decomposition
Applied Statistics
Other Statistics and Probability
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Gous, Alan
Buja, Andreas
Contributor
Abstract

This article describes how a mixture of two densities, f0 and f1, may be decomposed into a different mixture consisting of three densities. These new densities, f+, f−, and f=, summarize differences between f0 and f1: f+ is high in areas of excess of f1 compared to f0; f− represents deficiency of f1 compared to f0 in the same way; f= represents commonality between f1 and f0. The supports of f+ and f− are disjoint. This decomposition of the mixture of f0 and f1 is similar to the set-theoretic decomposition of the union of two sets A and B into the disjoint sets A\B, B\A, and A ∩ B. Sample points from f0 and f1 can be assigned to one of these three densities, allowing the differences between f0 and f1 to be visualized in a single plot, a visual hypothesis test of whether f0 is equal to f1. We describe two similar such decompositions and contrast their behavior under the null hypothesis f0 = f1, giving some insight into how such plots may be interpreted. We present two examples of uses of these methods: visualization of departures from independence, and of a two-class classification problem. Other potential applications are discussed.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2004-01-01
Journal title
Journal of Computational and Graphical Statistics
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection