APPLICATIONS OF GRAPH THEORY IN INFORMATION THEORY
Degree type
Graduate group
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Graphs provide a powerful framework for representing relationships and dependencies, becomingessential tools across various disciplines. While traditionally studied through computer science and combinatorics, their application in information theory introduces novel questions. At this intersection, fundamental problems emerge, such as developing efficient sampling and inference techniques that leverage graph structure, modeling correlation among individuals using contact graphs, understanding the limits of data compression for graph-structured data, and more. This thesis explores a selection of these challenges, aiming to uncover principles that combine graph theory and information theory to advance our understanding of networked data. In the first part of the thesis, from Chapter 2 to Chapter 4, we investigate the problem of group testing. Group testing, a problem with diverse applications across multiple disciplines, traditionally assumes independence across nodes’ states. Recent research, however, addresses real-world scenarios that often involve correlations among nodes, challenging the simplifying assumptions of existing models. In Chapter 2, we model and analyze group testing on n correlated nodes whose interactions are specified by a contact graph G. We represent correlation through an edge-faulty random graph formed from G, in which each edge is dropped with probability 1 − r, and in the resulting graph, all nodes in the same component share the same state. We design testing schemes for various graph structures, including cycles, trees, grids, and stochastic block models, and demonstrate how the number of tests decreases as the number of edges (i.e., correlation) increases. Specifically, when G is a cycle or tree, we show an improvement by a factor of log(1/r). For grid, a graph with almost 2n edges, the improvement is by a factor of (1 − r) log(1/r), indicating drastic improvement compared to trees. When G has a larger number of edges, as in SBM, the improvement can scale in n. Additionally, we establish lower bounds on the number of tests required for these graphs. In Chapter 3, we consider a general group testing problem where the statistical model of corre- lation is provided. To capture and leverage these correlations effectively, we model the problem with hypergraphs, inspired by Gonen et al. (2022), augmented by a probability mass function on the hyperedges. Using this model, we design a novel greedy adaptive algorithm and analyze its performance, providing theoretical guarantees on the number of tests required, which depend on the entropy of the underlying probability distribution and the average number of infections. Performance analysis provides upper bounds on the number of tests in the form of O(H(X) + μ) tests, where μ is the expected number of infections, and H(X) is the entropy of the edges. We demonstrate that the algorithm recovers and/or improves almost all previously known results for group testing with correlation. Additionally, we provide families of graphs where the algorithm is order-wise optimal and give examples where the algorithm or its analysis is not tight. We then generalize the proposed framework of group testing with general correlation in two directions, namely noisy group testing and semi-non-adaptive group testing. In both settings, we provide novel theoretical bounds on the number of tests required. In Chapter 4, we challenge the commonly held assumption that group testing is only effective when the infection rate is low. Traditionally, performance guarantees are expressed in terms of μ, the expected number of infections, and H(X), the entropy of the infection model. It is well known that H(X) serves as a fundamental lower bound on the number of tests required. However, in almost all prior work, the upper bounds on the number of tests typically scale as O(μ), even in the presence of correlation. In this work, we construct correlation structures that allow us to eliminate the dependence on μ in the number of tests. Specifically, for hypertrees with an underlying tree T , we show that all infections can be recovered using only O(H(X) + HN ) tests, where HN denotes the Hamiltonian number of T . For d-partite and d-regular hypergraphs, we prove that O(H(X)) tests are sufficient, establishing optimality. Furthermore, for d-regular hypergraphs in which any two edges differ by at least cH d nodes (for some constant cH ), we again show that O(H(X)) tests suffice—demonstrating optimality in this setting as well. In Chapter 5, we study the problem of compressing graphs with anonymous side information. Specifically, we aim to compress a graph G1 when a graph Gπ 2 is given, where Gπ 2 is obtained from G2 by permuting its nodes by an unknown permutation π. When π = I, i.e. the labels are known, this problem becomes equivalent to classic compression with side information. We discuss how results from Burkard and Fincke (1985), focused on combinatorial optimization for random inputs, help to test whether two graphs are correlated or not. We then demonstrate a significant improvement in compression rate using these insights alongside classical compression methods. In Chapter 6, I summarize the work from my first year. Specifically, we study the stochastic load balancing model, where the objective is to assign jobs to machines to minimize the timespan, defined as the time when all jobs are completed. We examine the case in which job times follow a Poisson distribution and show how a PTAS can be achieved, improving upon a known 2-approximation. We later design an efficient PTAS for this problem.
Advisor
Sarkar, Saswati, SS