Combinatorial Optimization On Massive Datasets: Streaming, Distributed, And Massively Parallel Computation

Assadi, Sepehr

Combinatorial Optimization On Massive Datasets: Streaming, Distributed, And Massively Parallel Computation

Files

Assadi_upenngdas_0175C_13518.pdf (1.7 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Subject

Combinatorial Optimization
Communication Complexity
Distributed Computing
Massively Parallel Computation
Streaming
Computer Sciences

Copyright date

2019-04-02T20:18:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/30076

View all metadata

Author

Assadi, Sepehr

Abstract

With the emergence of massive datasets across different application domains, there is a rapidly growing need to solve various optimization tasks over such datasets. This in turn raises the following fundamental question: How well can we solve a large-scale optimization problem on massive datasets in a resource-efficient manner? The focus of this thesis is on answering this question for various problems in different modern computational models for processing massive datasets, in particular, streaming, distributed, and massively parallel computation (such as MapReduce) models. The first part of this thesis is focused on graph optimization, in which we study several fundamental graph optimization problems including matching, vertex cover, and connectivity. The main results in this part include a tight bound on the space complexity of the matching problem in dynamic streams, a unifying framework for achieving algorithms that improve the state-of-the-art for matching and vertex cover problems in all the mentioned models, and the first massively parallel algorithm for connectivity on sparse graphs that improve upon the classical parallel PRAM algorithms for a large family of graphs. In the second part of the thesis, we consider submodular optimization and in particular two canonical examples of set cover and maximum coverage problems. We establish tight bounds on the space complexity of approximating set cover and maximum coverage in both single- and multi-pass streams, and build on these results to address the general problem of submodular maximization across all aforementioned models. In the last part, we consider resource constrained optimization settings which require computation over data which is not particularly large but still imposes restrictions of similar nature. Examples include optimization over data which can only be accessed with limited adaptivity (e.g. in crowdsourcing applications), or corresponds to private information of agents and is not available in an integrated form (e.g., in auction and mechanism design applications). We use the toolkit developed in the first two parts of the thesis to obtain several new results for such problems, significantly improving the state-of-the-art. A common theme in this thesis is a rigorous study of problems from both an algorithmic perspective and impossibility results. Rather than coming up with ad hoc solutions to problems at hand, the goal here is to develop general techniques for solving various large-scale optimization problems in a unified way in different models, which in turns requires further understanding of the limitations of known approaches for addressing these problems.

Advisor

Sanjeev Khanna

Date of degree

2018-01-01

Collection

Dissertations and Theses