Visual Comparison of Datasets Using Mixture Decompositions
Penn collection
Degree type
Discipline
Subject
data visualization
density estimation
exploratory data analysis
mixture decomposition
Applied Statistics
Other Statistics and Probability
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
This article describes how a mixture of two densities, f0 and f1, may be decomposed into a different mixture consisting of three densities. These new densities, f+, f−, and f=, summarize differences between f0 and f1: f+ is high in areas of excess of f1 compared to f0; f− represents deficiency of f1 compared to f0 in the same way; f= represents commonality between f1 and f0. The supports of f+ and f− are disjoint. This decomposition of the mixture of f0 and f1 is similar to the set-theoretic decomposition of the union of two sets A and B into the disjoint sets A\B, B\A, and A ∩ B. Sample points from f0 and f1 can be assigned to one of these three densities, allowing the differences between f0 and f1 to be visualized in a single plot, a visual hypothesis test of whether f0 is equal to f1. We describe two similar such decompositions and contrast their behavior under the null hypothesis f0 = f1, giving some insight into how such plots may be interpreted. We present two examples of uses of these methods: visualization of departures from independence, and of a two-class classification problem. Other potential applications are discussed.