Departmental Papers (CIS)

Date of this Version

3-16-2012

Document Type

Conference Paper

Comments

Ahn, K. J., Guha, S., & McGregor, A. Graph Sketches: Sparsification, Spanners, and Subgraphs. SIGMOD Symposium on Principles of Database Systems (PODS 2012). Scottsdale, Arizona, USA. May 20-24, 2012.
http://www.sigmod.org/2012/

©ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version to be published in PODS '12:Proceedings of the thirty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. http://www.acm.org

Abstract

When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements.

In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that O̅(nε-2) random linear projections of a graph on n nodes suffice to (1 + ε) approximate all cut values. Similarly, we show that O-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds.

Keywords

Streaming, Graph Sparsification, Spanners, Sketches

Share

COinS
 

Date Posted: 25 April 2012

This document has been peer reviewed.