Graph Machine Learning Under Requirements

Cervino, Juan

Graph Machine Learning Under Requirements

Files

Cervino_upenngdas_0175C_16561.pdf (7.6 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Electrical and Systems Engineering

Discipline

Electrical Engineering
Mathematics
Computer Sciences

Subject

Graph Machine Learning
Graph Neural Networks
Machine Learning
Optimization
Signal Processing

Copyright date

01/01/2024

Permalink

https://repository.upenn.edu/handle/20.500.14332/60405

View all metadata

Author

Cervino, Juan

Abstract

Graphs are powerful mathematical tools that enable modeling of complex systems.Graph machine learning exploits possibly unknown data structures, and provides a unified approach to tackle a wide variety of problems. However, graph machine learning solutions tend to suffer from three main limitations: they do not scale with the size of the graph, they are not robust to changes of the graph, and they require an homogeneous underlying graph. In this thesis we address all three of these requirements. In terms of scalability, we show that Graph Neural Networks (GNNs) improve their generalization capabilities with the number of nodes, which motivates the need for scalable training solutions. To this end, I developed two strategies to train GNNs on large scale graphs, first by growing the graphs in time as we train, and second distributing the graph in a set of machines and growing the graph in space. These two methods, alleviate the computational and communication costs required to train GNNs on large scale graphs, without compromising the accuracy. In terms of robustness, even though many data modalities reside in a very high-dimensional space, their dynamics can be assumed to belong to a lower-dimensional structure. We can model the low dimensional space using a graph Laplacian, and show that the problem of learning a Lipschitz continuous function on a manifold is equivalent to a dynamically weighted manifold regularization problem. Finally, heterogeneity is a fundamental property of networks. Even if a network is composed of homogeneous agents, every agent will have different interactions with the environment, which will translate into heterogeneous data acquisition. Graph machine learning solutions that do not address the heterogeneous nature of the data tend to only allocate the needs of some of the nodes in the graph. I propose two ways to tackle heterogeneity, a single solution to improve the outcome of all the individual components in the network, and a node-specific solution such that every agent in the network improves upon working with their individual data.

Advisor

Ribeiro, Alejandro

Date of degree

2024

Collection

Dissertations and Theses