#### Date of Award

2013

#### Degree Type

Dissertation

#### Degree Name

Doctor of Philosophy (PhD)

#### Graduate Group

Computer and Information Science

#### First Advisor

Sudipto Guha

#### Abstract

Massive graphs arise in a many scenarios, for example,

traffic data analysis in large networks, large scale scientific

experiments, and clustering of large data sets.

The semi-streaming model was proposed for processing massive graphs. In the semi-streaming model, we have a random

accessible memory which is near-linear in the number of vertices.

The input graph (or equivalently, edges in the graph)

is presented as a sequential list of edges (insertion-only model)

or edge insertions and deletions (dynamic model). The list

is read-only but we may make multiple passes over the list.

There has been a few results in the insertion-only model

such as computing distance spanners and approximating

the maximum matching.

In this thesis, we present some algorithms and techniques

for (i) solving more complex problems in the semi-streaming model,

(for example, problems in the dynamic model) and (ii) having

better solutions for the problems which have been studied

(for example, the maximum matching problem). In course of both

of these, we develop new techniques with broad applications and

explore the rich trade-offs between the complexity of models

(insertion-only streams vs. dynamic streams), the number

of passes, space, accuracy, and running time.

1. We initiate the study of dynamic graph streams.

We start with basic problems such as the connectivity

problem and computing the minimum spanning tree.

These problems are

trivial in the insertion-only model. However, they require

non-trivial (and multiple passes for computing the exact minimum

spanning tree) algorithms in the

dynamic model.

2. Second, we present a graph sparsification algorithm in the

semi-streaming model. A graph sparsification

is a sparse graph that approximately preserves

all the cut values of a graph.

Such a graph acts as an oracle for solving cut-related problems,

for example, the minimum cut problem and the multicut problem.

Our algorithm produce a graph sparsification with high probability

in one pass.

3. Third, we use the primal-dual algorithms

to develop the semi-streaming algorithms.

The primal-dual algorithms have been widely accepted

as a framework for solving linear programs

and semidefinite programs faster.

In contrast, we apply the method for reducing space and

number of passes in addition to reducing the running time.

We also present some examples that arise in applications

and show how to apply the techniques:

the multicut problem, the correlation clustering problem,

and the maximum matching problem. As a consequence,

we also develop near-linear time algorithms for the $b$-matching

problems which were not known before.

#### Recommended Citation

Ahn, Kook Jin, "Analyzing Massive Graphs in the Semi-streaming Model" (2013). *Publicly Accessible Penn Dissertations*. 606.

https://repository.upenn.edu/edissertations/606