Khanna, Sanjeev
Email Address
ORCID
Disciplines
Search Results
Now showing 1 - 10 of 58
Publication Archiving Scientific Data(2002-06-04) Buneman, Peter; Khanna, Sanjeev; Tajima, Keishi; Tan, Wang-ChiewWe present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools.Publication Social Welfare in One-Sided Matching Markets Without Money(2011-08-01) Bhalgat, Anand; Chakrabarty, Deeparnab; Khanna, SanjeevWe study social welfare in one-sided matching markets where the goal is to efficiently allocate n items to n agents that each have a complete, private preference list and a unit demand over the items. Our focus is on allocation mechanisms that do not involve any monetary payments.We consider two natural measures of social welfare: the ordinal welfare factor which measures the number of agents that are at least as happy as in some unknown, arbitrary benchmark allocation, and the linear welfare factor which assumes an agent’s utility linearly decreases down his preference lists, and measures the total utility to that achieved by an optimal allocation. We analyze two matching mechanisms which have been extensively studied by economists. The first mechanism is the random serial dictatorship (RSD) where agents are ordered in accordance with a randomly chosen permutation, and are successively allocated their best choice among the unallocated items. The second mechanism is the probabilistic serial (PS) mechanism of Bogomolnaia and Moulin [8], which computes a fractional allocation that can be expressed as a convex combination of integral allocations. The welfare factor of a mechanism is the infimum over all instances. For RSD, we show that the ordinal welfare factor is asymptotically 1/2, while the linear welfare factor lies in the interval [.526, 2/3]. For PS, we show that the ordinal welfare factor is also 1/2 while the linear welfare factor is roughly 2/3. To our knowledge, these results are the first non-trivial performance guarantees for these natural mechanisms.Publication The Approximability of Constraint Satisfaction Problems(2000-01-01) Khanna, Sanjeev; Sudan, Madhu; Trevisan, Luca; Williamson, David PWe study optimization problems that may be expressed as "Boolean constraint satisfaction problems". An instance of a Boolean constraint satisfaction problem is given by m constraints applied to n Boolean variables. Different computational problems arise from constraint satisfaction problems depending on the nature of the "underlying" constraints as well as on the goal of the optimization task. Here we consider four possible goals: MAX CSP (MIN CSP) is the class of problems where the goal is to find an assignment maximizing the number of satisfied constraints (minimizing the number of unsatisfied constraints). MAX ONES (MIN ONES) is the class of optimization problems where the goal is to find an assignment satisfying all constraints with maximum (minimum) number of variables set to 1. Each class consists of infinitely many problems and a problem within a class is specified by a finite collection of finite Boolean functions that describe the possible constraints that may be used. Tight bounds on the approximability of every problem in MAX CSP were obtained by Creignou [11]. In this work we determine tight bounds on the "approximability" (i.e., the ratio to within which each problem may be approximated in polynomial time) of every problem in MAX ONES, MIN CSP and MIN ONES. Combined with the result of Creignou, this completely classifies all optimization problems derived from Boolean constraint satisfaction. Our results capture a diverse collection of optimization problems such as MAX 3-SAT, MAX CUT, MAX CLIQUE, MIN CUT, NEAREST CODEWORD etc. Our results unify recent results on the (in)approximability of these optimization problems and yield a compact presentation of most known results. Moreover, these results provide a formal basis to many statements on the behavior of natural optimization problems, that have so far only been observed empirically.Publication A PTAS for Minimizing Average Weighted Completion Time With Release Dates on Uniformly Related Machines(2000-01-01) Chekuri, Chandra; Khanna, SanjeevA classical scheduling problem is to find schedules that minimize average weighted completion time of jobs with release dates. When multiple machines are available, the machine environments may range from identical machines (the processing time required by a job is invariant across the machines) at one end, to unrelated machines (the processing time required by a job on any machine is an arbitrary function of the specific machine) at the other end of the spectrum. While the problem is strongly NP-hard even in the case of a single machine, constant factor approximation algorithms have been known for even the most general machine environment of unrelated machines. Recently, a polynomial-time approximation scheme (PTAS) was discovered for the case of identical parallel machines [1]. In contrast, it is known that this problem is MAX SNP-hard for unrelated machines [10]. An important open problem is to determine the approximability of the intermediate case of uniformly related machines where each machine i has a speed si and it takes p/si time to executing a job of processing size pIn this paper, we resolve this problem by obtaining a PTAS for the problem. This improves the earlier known ratio of (2 + ∈) for the problem.Publication Reconstructing Strings from Random Traces(2004-01-11) Kannan, Sampath; Batu, Tugkan; Khanna, Sanjeev; McGregor, AndrewWe are given a collection of m random subsequences (traces) of a string t of length n where each trace is obtained by deleting each bit in the string with probability q. Our goal is to exactly reconstruct the string t from these observed traces. We initiate here a study of deletion rates for which we can successfully reconstruct the original string using a small number of samples. We investigate a simple reconstruction algorithm called Bitwise Majority Alignment that uses majority voting (with suitable shifts) to determine each bit of the original string. We show that for random strings t, we can reconstruct the original string (w.h.p.) for q = O(1/ log n) using only O(log n) samples. For arbitrary strings t, we show that a simple modification of Bitwise Majority Alignment reconstructs a string that has identical structure to the original string (w.h.p.) for q = O(1/n1/2+ε) using O(1) samples. In this case, using O(n log n) samples, we can reconstruct the original string exactly. Our setting can be viewed as the study of an idealized biological evolutionary process where the only possible mutations are random deletions. Our goal is to understand at what mutation rates, a small number of observed samples can be correctly aligned to reconstruct the parent string. In the process of establishing these results, we show that Bitwise Majority Alignment has an interesting selfcorrecting property whereby local distortions in the traces do not generate errors in the reconstruction and eventually get corrected.Publication Why and Where: A Characterization of Data Provenance(2001-01-01) Buneman, Peter; Khanna, Sanjeev; Tan, Wang-ChiewWith the proliferation of database views and curated databases, the issue of data provenance - where a piece of data came from and the process by which it arrived in the database - is becoming increasingly important, especially in scientific databases where understanding provenance is crucial to the accuracy and currency of data. In this paper we describe an approach to computing provenance when the data of interest has been created by a database query. We adopt a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML. A novel aspect of our work is a distinction between "why" provenance (refers to the source data that had some influence on the existence of the data) and "where" provenance (refers to the location(s) in the source databases from which the data was extracted).Publication Randomized Pursuit-Evasion With Limited Visibility(2003-01-01) Kannan, Sampath; Isler, Volkan; Khanna, SanjeevWe study the following pursuit-evasion game: One or more hunters are seeking to capture an evading rabbit on a graph. At each round, the rabbit tries to gather information about the location of the hunters but it can see them only if they are located on adjacent nodes. We show that two hunters suffice for catching rabbits with such local visibility with high probability. We distinguish between reactive rabbits who move only when a hunter is visible and general rabbits who can employ more sophisticated strategies. We present polynomial time algorithms that decide whether a graph G is hunter-win, that is, if a single hunter can capture a rabbit of either kind on G.Publication On Computing Functions with Uncertainty(2001-05-21) Khanna, Sanjeev; Tan, Wang-ChiewWe study the problem of computing a function f(x1, ..., xn) given that the actual values of the variables xi's are known only with some uncertainity. For each variable xi, an interval Ii is known such that the value of xi is guaranteed to fall within this interval. Any such interval can be probed to obtain the actual value of the underlying variable; however, there is a cost associated with each such probe. The goal is to adaptively identify a minimum cost sequence of probes such that regardless of the actual values taken by the unprobed xi's, the value of the function f can be computed to within a specified precision. We design online algorithms for this problem when f is either the selection function or an aggregation function such as sum or average. We consider three natural models of precision and give algorithms for each model. We analyze our algorithms in the framework of competitive analysis and show that our algorithms are asymptotically optimal. Finally, we also study online algorithms for functions that are obtained by composing together selection and aggregation functions.Publication Algorithms for the Generalized Sorting Problem(2011-10-01) Kannan, Sampath; Huang, Zhiyi; Khanna, SanjeevWe study the generalized sorting problem where we are given a set of n elements to be sorted but only a subset of all possible pairwise element comparisons is allowed. The goal is to determine the sorted order using the smallest possible number of allowed comparisons. The generalized sorting problem may be equivalently viewed as follows. Given an undirected graph G(V,E) where V is the set of elements to be sorted and E defines the set of allowed comparisons, adaptively find the smallest subset E' E of edges to probe such that the directed graph induced by E' contains a Hamiltonian path. When G is a complete graph, we get the standard sorting problem, and it is well-known that Θ(n log n) comparisons are necessary and sufficient. An extensively studied special case of the generalized sorting problem is the nuts and bolts problem where the allowed comparison graph is a complete bipartite graph between two equal-size sets. It is known that for this special case also, there is a deterministic algorithm that sorts using Θ(n log n) comparisons. However, when the allowed comparison graph is arbitrary, to our knowledge, no bound better than the trivial O(n2) bound is known. Our main result is a randomized algorithm that sorts any allowed comparison graph using Õ(n3⁄2) comparisons with high probability (provided the input is sortable). We also study the sorting problem in randomly generated allowed comparison graphs, and show that when the edge probability is p, Õ(min{n/p2 , n3⁄2p √p}) comparisons suffice on average to sort.Publication Perfect Matchings in O(n log n) Time in Regular Bipartite Graphs(2010-06-01) Goel, Ashish; Kapralov, Michael; Khanna, SanjeevIn this paper we consider the well-studied problem of finding a perfect matching in a d-regular bipartite graph on 2n nodes with m = nd edges. The best-known algorithm for general bipartite graphs (due to Hopcroft and Karp) takes time O(m√n). In regular bipartite graphs, however, a matching is known to be computable in O(m) time (due to Cole, Ost, and Schirra). In a recent line of work by Goel, Kapralov, and Khanna the O(m) time bound was improved first to Ō(min{m, n2.5/d}) and then to Ō(min{m, n²/d}). In this paper, we give a randomized algorithm that finds a perfect matching in a d-regular graph and runs in O(n log n) time (both in expectation and with high probability). The algorithm performs an appropriately truncated alternating random walk to successively find augmenting paths. Our algorithm may be viewed as using adaptive uniform sampling, and is thus able to bypass the limitations of (nonadaptive) uniform sampling established in earlier work. Our techniques also give an algorithm that successively finds a matching in the support of a doubly stochastic matrix in expected time O(n log² n), with O(m) pre-processing time; this gives a simple O(m+mnlog² n) time algorithm for finding the Birkhoff-von Neumann decomposition of a doubly stochastic matrix. We show that randomization is crucial for obtaining o(nd) time algorithms by establishing an Ω(nd) lower bound for deterministic algorithms. We also show that there does not exist a randomized algorithm that finds a matching in a regular bipartite multigraph and takes o(n log n) time with high probability.