Guha, Sudipto

Email Address
ORCID
Disciplines
Research Projects
Organizational Units
Position
Introduction
Research Interests

Search Results

Now showing 1 - 10 of 17
  • Publication
    SmartCIS: Integrating Digital and Physical Environments
    (2010-01-01) Liu, Mengmeng; Mihaylov, Svilen; Ives, Zachary G; Bao, Zhuowei; Loo, Boon Thau; Jacob, Marie; Guha, Sudipto
  • Publication
    Dynamic Join Optimization in Multi-Hop Wireless Sensor Networks
    (2010-01-01) Mihaylov, Svilen; Ives, Zachary G; Jacob, Marie; Guha, Sudipto
    To enable smart environments and self-tuning data centers, we are developing the Aspen system for integrating physical sensor data, as well as stream data coming from machine logical state, and database or Web data from the Internet. A key component of this system is a query processor optimized for limited-bandwidth, possibly battery-powered devices with multiple hop wireless radio communications. This query processor is given a portion of a data integration query, possibly including joins among sensors, to execute. Several recent papers have developed techniques for computing joins in sensors, but these techniques are static and are only appropriate for specific join selectivity ratios. We consider the problem of dynamic join optimization for sensor networks, developing solutions that employ cost modeling, as well as adaptive learning and self-tuning heuristics to choose the best algorithm under real and variable selectivity values. We focus on in-network join computation, but our architecture extends to other approaches (and we compare against these). We develop basic techniques assuming selectivities are uniform and known in advance, and optimization can be done on a pairwise basis; we then extend the work to handle joins between multiple pairs, when selectivities are not fully known. We experimentally validate our work at scale using standard datasets.
  • Publication
    Lower Bounds for Quantile Estimation in Random-Order and Multi-Pass Streaming
    (2007-08-26) Guha, Sudipto; McGregor, Andrew
    We present lower bounds on the space required to estimate the quantiles of a stream of numerical values. Quantile estimation is perhaps the most studied problem in the data stream model and it is relatively well understood in the basic single-pass data stream model in which the values are ordered adversarially. Natural extensions of this basic model include the random-order model in which the values are ordered randomly (e.g. [21,5,13,11,12]) and the multi-pass model in which an algorithm is permitted a limited number of passes over the stream (e.g. [6,7,1,19,2,6,7,19,2]). We present lower bounds that complement existing upper bounds [21,11] in both models. One consequence is an exponential separation between the random-order and adversarial-order models: using Ω(polylog n) space, exact selection requires Ω(log n) passes in the adversarial-order model while O(loglog n) passes are sufficient in the random-order model.
  • Publication
    The Steiner k-Cut Problem
    (2006-03-24) Chekuri, Chandra; Guha, Sudipto; Naor, Joseph
    We consider the Steiner k-cut problem which generalizes both the k-cut problem and the multiway cut problem. The Steiner k-cut problem is defined as follows. Given an edge-weighted undirected graph G = (V,E), a subset of vertices X ⊆ V called terminals, and an integer k ≤ |X|, the objective is to find a minimum weight set of edges whose removal results in k disconnected components, each of which contains at least one terminal. We give two approximation algorithms for the problem: a greedy (2 − 2/k )-approximation based on Gomory–Hu trees, and a (2 − 2/|X|)-approximation based on rounding a linear program. We use the insight from the rounding to develop an exact bidirected formulation for the global minimum cut problem (the k-cut problem with k = 2).
  • Publication
    Approximation Algorithms for Wavelet Transform Coding of Data Streams
    (2008-02-01) Guha, Sudipto; Harb, Boulos
    This paper addresses the problem of finding a B-term wavelet representation of a given discrete function ƒ ∈ Rn whose distance from ƒ is minimized. The problem is well understood when we seek to minimize the Euclidean distance between ƒ and its representation. The first-known algorithms for finding provably approximate representations minimizing general lp distances (including l∞) under a wide variety of compactly supported wavelet bases are presented in this paper. For the Haar basis, a polynomial time approximation scheme is demonstrated. These algorithms are applicable in the one-pass sublinear-space data stream model of computation. They generalize naturally to multiple dimensions and weighted norms. A universal representation that provides a provable approximation guarantee under all p-norms simultaneously; and the first approximation algorithms for bit-budget versions of the problem, known as adaptive quantization, are also presented. Further, it is shown that the algorithms presented here can be used to select a basis from a tree-structured dictionary of bases and find a B-term representation of the given function that provably approximates its best dictionary-basis representation.
  • Publication
    Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem
    (2012-01-29) Ahn, KookJin; Guha, Sudipto
    In this paper we study linear-programming based approaches to the maximum matching problem in the semi-streaming model. In this model edges are presented sequentially, possibly in an adversarial order, and we are only allowed to use a small space. The allowed space is near linear in the number of vertices (and sublinear in the number of edges) of the input graph. The semi-streaming model is relevant in the context of processing of very large graphs. In recent years, there have been several new and exciting results in the semi-streaming model. However broad techniques such as linear programming have not been adapted to this model. In this paper we present several techniques to adapt and optimize linear-programming based approaches in the semi-streaming model. We use the maximum matching problem as a foil to demonstrate the effectiveness of adapting such tools in this model. As a consequence we improve almost all previous results on the semi-streaming maximum matching problem. We also prove new results on interesting variants.
  • Publication
    A Note on Linear Time Algorithms for Maximum Error Histograms
    (2007-07-01) Guha, Sudipto; Shim, Kyuseok
    Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, e.g., V-Optimal, use error measures which are the sums of a suitable function, e.g., square, of the error at each point. Although the best-known algorithms for solving these problems run in quadratic time, a sequence of results have given us a linear time approximation scheme for these algorithms. In recent years, there have been many emerging applications where we are interested in measuring the maximum (absolute or relative) error at a point. We show that this problem is fundamentally different from the other traditional nonl∞ error measures and provide an optimal algorithm that runs in linear time for a small number of buckets. We also present results which work for arbitrary weighted maximum error measures.
  • Publication
    Graph Sketches: Sparsification, Spanners, and Subgraphs
    (2012-03-16) Ahn, KookJin; Guha, Sudipto; Mcgregor, Andrew
    When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements. In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that O̅(nε-2) random linear projections of a graph on n nodes suffice to (1 + ε) approximate all cut values. Similarly, we show that O(ε-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds.
  • Publication
    Analyzing Graph Structure via Linear Measurements
    (2012-02-29) Ahn, KookJin; Guha, Sudipto; Mcgregor, Andrew
    We initiate the study of graph sketching, i.e., algorithms that use a limited number of linear measurements of a graph to determine the properties of the graph. While a graph on n nodes is essentially O(n2)-dimensional, we show the existence of a distribution over random projections into d-dimensional "sketch" space (d « n2) such that several relevant properties of the original graph can be inferred from the sketch with high probability. Specifically, we show that: d=O(n · polylog n) suffices to evaluate properties including connectivity, k-connectivity, bipartiteness, and to return any constant approximation of the weight of the minimum spanning tree. d=O(n1+γ) suffices to compute graph sparsifiers, the exact MST, and approximate the maximum weighted matchings if we permit O(1/γ)-round adaptive sketches, i.e., a sequence of projections where each projection may be chosen dependent on the outcome of earlier sketches. Our results have two main applications, both of which have the potential to give rise to fruitful lines of further research. First, our results can be thought of as giving the first compressed-sensing style algorithms for graph data. Secondly, our work initiates the study of dynamic graph streams. There is already extensive literature on processing massive graphs in the data-stream model. However, the existing work focuses on graphs defined by a sequence of inserted edges and does not consider edge deletions.We think this is a curious omission given the existing work on both dynamic graphs in the non-streaming setting and dynamic geometric streaming. Our results include the first dynamic graph semi-streaming algorithms for connectivity, spanning trees, sparsification, and matching problems.
  • Publication
    Graph Sparsification in the Semi-Streaming Model
    (2009-05-05) Ahn, KookJin; Guha, Sudipto
    Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems has been limited. A main reason for this has been the existence of linear space lower bounds for even simple problems such as determining the connectedness of a graph. However, in many new scenarios that arise from social and other interaction networks, the number of vertices is significantly less than the number of edges. This has led to the formulation of the semi-streaming model where we assume that the space is (near) linear in the number of vertices (but not necessarily the edges), and the edges appear in an arbitrary (and possibly adversarial) order. In this paper we focus on graph sparsification, which is one of the major building blocks in a variety of graph algorithms. There has been a long history of (non-streaming) sampling algorithms that provide sparse graph approximations and it a natural question to ask if the sparsification can be achieved using a small space, and in addition using a single pass over the data? The question is interesting from the standpoint of both theory and practice and we answer the question in the affirmative, by providing a one pass $\tilde{O}(n/\epsilon^{2})$ space algorithm that produces a sparsification that approximates each cut to a $(1+\epsilon)$ factor. We also show that $\Omega(n \log \frac1\epsilon)$ space is necessary for a one pass streaming algorithm to approximate the min-cut, improving upon the $\Omega(n)$ lower bound that arises from lower bounds for testing connectivity.