Date of Award
Doctor of Philosophy (PhD)
Electrical & Systems Engineering
Graph neural networks (GNNs) are successful at learning representations from most types of network data but suffer from limitations in the case of large graphs. Challenges arise in the very design of the learning architecture, as most GNNs are parametrized by some matrix representation of the graph (e.g., the adjacency matrix) which can be hard to acquire when the network is large. Moreover, in many GNN architectures graph operations are defined through convolutional operations in the spectral domain. In this case, another obstacle is the obtention of the graph spectrum, which requires a costly matrix eigendecomposition.
Yet, large graphs can often be identified as being similar to each other in the sense that they share structural properties. We can thus expect that processing data supported on such graphs should yield similar results, which would mitigate the challenge of large size since we could then design GNNs for small graphs and transfer them to larger ones. In this thesis, I formalize this intuition and show that this graph transferability is possible when the graphs belong to the same "family", where each family is identified by a different graphon.
A graphon is a function W(x,y) that describes a class of stochastic graphs with similar shape. One can think of the arguments (x,y) as the labels of a pair of nodes and of the graphon value W(x,y) as the probability of an edge between x and y. This yields a notion of a graph sampled from a graphon or, equivalently, a notion of a limit as the number of nodes in the sampled graph grows. Graphs sampled from a graphon almost surely share properties in the limit such as homomorphism densities which, in practice, implies that graphons identify families of networks that are similar in the sense that the density of certain "motifs" is preserved. This motivates the study of information processing on graphons as a way to enable information processing on large graphs.
The central component of a signal processing theory is a notion of shift that induces a class of linear filters with a spectral representation characterized by a Fourier transform (FT). In this thesis, we show that graphons induce a linear operator which can be used to define a shift and therefore graphon filters and the graphon FT. Building on the convergence properties of sequences of graphs and associated graph signals, it is then possible to show that for these sequences the graph FT converges to the graphon FT and that graph filter outputs converge to the outputs of the graphon filter with same coefficients. These theorems imply that for graphs that belong to certain families, graph Fourier analysis and graph filter design have well defined limits. In turn, these facts enable graph information processing on graphs with large number of nodes, since information processing pipelines designed for limit graphons can be applied to finite graphs.
We further define graphon neural networks (WNNs) by composing graphon filters banks with pointwise nonlinearities. WNNs are idealized limits which do not exist in practice, but they are a useful tool to understand the fundamental properties of GNNs. In particular, the sampling and convergence results derived for graphon filters canbe readily extended to WNNs, allowing to show that GNNs converge to WNNs as graphs converge to graphons. If two GNNs can be made arbitrarily close to the same WNN, then by a simple triangle inequality argument they can also be made arbitrarily close to one other. This result formalizes our intuition that GNNs are transferable between similar graphs.
A GNN can be trained on a moderate-scale graph and executed on a large-scale graph with a transferability error dominated by the inverse of the size of the smallest graph. Interestingly, this error increases with the variability of the spectral response of the convolutional filters, revealing a trade-off between transferability and spectral discriminability that is inherited from graph filters. In practice, this trade-off is less present in GNNs due to nonlinearities, which are able to scatter spectral components of the data to different parts of the eigenvalue spectrum where they can be discriminated. This explains why GNNs are more transferable than graph filters.
Ruiz, Luana, "Machine Learning On Large-Scale Graphs" (2022). Publicly Accessible Penn Dissertations. 5252.