Date of Award
Doctor of Philosophy (PhD)
Electrical & Systems Engineering
Networked data structures has been getting big, ubiquitous, and pervasive. As our day-to-day activities become more incorporated with and influenced by the digital world, we rely more on our intuition to provide us a high-level idea and subconscious understanding of the encountered data. This thesis aims at translating the qualitative intuitions we have about networked data into quantitative and formal tools by designing rigorous yet reasonable algorithms. In a nutshell, this thesis constructs models to compare and cluster networked data, to simplify a complicated networked structure, and to formalize the notion of smoothness and variation for domain-specific signals on a network. This thesis consists of two interrelated thrusts which explore both the scenarios where networks have intrinsic value and are themselves the object of study, and where the interest is for signals defined on top of the networks, so we leverage the information in the network to analyze the signals. Our results suggest that the intuition we have in analyzing huge data can be transformed into rigorous algorithms, and often the intuition results in superior performance, new observations, better complexity, and/or bridging two commonly implemented methods. Even though different in the principles they investigate, both thrusts are constructed on what we think as a contemporary alternation in data analytics: from building an algorithm then understanding it to having an intuition then building an algorithm around it.
We show that in order to formalize the intuitive idea to measure the difference between a pair of networks of arbitrary sizes, we could design two algorithms based on the intuition to find mappings between the node sets or to map one network into the subset of another network. Such methods also lead to a clustering algorithm to categorize networked data structures. Besides, we could define the notion of frequencies of a given network by ordering features in the network according to how important they are to the overall information conveyed by the network. These proposed algorithms succeed in comparing collaboration histories of researchers, clustering research communities via their publication patterns, categorizing moving objects from uncertain measurmenets, and separating networks constructed from different processes.
In the context of data analytics on top of networks, we design domain-specific tools by leveraging the recent advances in graph signal processing, which formalizes the intuitive notion of smoothness and variation of signals defined on top of networked structures, and generalizes conventional Fourier analysis to the graph domain. In specific, we show how these tools can be used to better classify the cancer subtypes by considering genetic profiles as signals on top of gene-to-gene interaction networks, to gain new insights to explain the difference between human beings in learning new tasks and switching attentions by considering brain activities as signals on top of brain connectivity networks, as well as to demonstrate how common methods in rating prediction are special graph filters and to base on this observation to design novel recommendation system algorithms.
Huang, Weiyu, "Networked Data Analytics: Network Comparison And Applied Graph Signal Processing" (2018). Publicly Accessible Penn Dissertations. 2826.