ORCHESTRATED APPROXIMATE MESSAGE PASSING: A NOVEL WAY OF MULTIMODAL DATA INTEGRATION

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Statistics and Data Science
Discipline
Statistics and Probability
Subject
Approximate Message Passing
Data integration
Multimodal Data
Funder
Grant number
License
Copyright date
01/01/2024
Distributor
Related resources
Author
Nandy, Sagnik
Contributor
Abstract

Multimodal data analysis has garnered considerable attention in data science due to its wide applicability across several scientific disciplines. However, developing efficient and statistically sound procedures for information integration in multimodal datasets remains a challenge. In this dissertation, we tackle the multimodal data integration problem using a variant of the Approximate Message Passing algorithm introduced in Donoho et al. (2009). Our variant not only matches popularly used techniques in accuracy but also allows for statistical inference with the results of the algorithm, thanks to an exact asymptotic characterization of the estimation error. Such a feature facilitating statistical inference with the results generated from a data integration algorithm is not widely available in the literature. Moreover, the signal reconstruction risk of our algorithm is provably Bayes optimal which enhances its appeal. This dissertation is structured into three parts. In the first part, we introduce the algorithm and discuss its mathematical properties. In the second part, we apply this algorithm to delineate the phase transition threshold for community detection in Contextual Stochastic Block Models introduced in Deshpande et. al (2018). This application demonstrates the algorithm's efficiency in optimally integrating information across network data and node features. In the final part, we first develop a data-adaptive version of the algorithm and use it to develop a data integration procedure useful to integrate information across single-cell multi-omic datasets. Furthermore, we also develop a technique to map query data points with partially observed modalities to the integrated embeddings constructed using the training data with high confidence. In real data, our method competes with state-of-art integration techniques used in single-cell multi-omic data analysis, in terms of effective construction of cell atlases that segregate different cell types into interpretable clusters. Moreover, this method also provides an avenue to map new query cells with partially observed features to such atlases.

Advisor
Bhattacharya, Bhaswar B.
Ma, Zongming
Date of degree
2024
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation