ANALYZING DISEASE HETEROGENEITY VIA WEAKLY-SUPERVISED DEEP LEARNING

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Applied Mathematics and Computational Science
Discipline
Data Science
Engineering
Computer Sciences
Subject
Funder
Grant number
License
Copyright date
01/01/2024
Distributor
Related resources
Author
Yang, Zhijian
Contributor
Abstract

Heterogeneity of brain diseases poses significant challenges for precision medicine. While a plethora of machine learning methods have been applied to imaging data, enabling the construction of clinically relevant imaging signatures for neurological and neuropsychiatric diseases, they often overlook explicit modeling of disease heterogeneity. Moreover, unsupervised methods may inadvertently capture heterogeneity driven by nuisance confounding factors that affect brain structure or function, rather than heterogeneity relevant to the pathology or condition of interest.In this thesis, we have proposed a series of weakly-supervised deep learning approaches that utilize normal control data as reference, specifically characterizing disease effects on brain changes through deep generative modeling. Following this principle, we first proposed, Smile-GAN, a clustering method that estimates dominant subtypes and categorizes patients’ imaging data according to disease-related imaging patterns. Second, built upon the foundation established by Smile-GAN, we introduced an improved representation learning approach, Surreal-GAN, which not only captures disease effects, but further disentangles spatial and temporal variations in brain changes, producing concise representation indices directly indicating the severity of different brain change patterns. While Smile-GAN and Surreal-GAN focus solely on capturing disease heterogeneity from neuroimaging data, they may overlook valuable information from other modalities, such as genetics. Therefore, we further developed the multi-view method Gene-SGAN. By effectively distilling information from both imaging and genetic data, Gene-SGAN separates brain changes with and without genetic associations through multi-modal learning, thereby deriving disease endophenotypes closer to the underlying biology. All three methods were first extensively validated through synthetic experiments with known simulated ground truth. More importantly, their applications to different cohorts of real participants’ data enhanced our understanding of heterogeneous brain changes related to Alzheimer’s disease and the general brain aging process. The derived clusters or indices of these methods demonstrate significant associations with distinct biomedical, lifestyle, and genetic factors, providing insights into the etiology of observed variances. Moreover, they show predictive value for future neurodegeneration, disease progression, and mortality. Consequently, these methods hold promise for more personalized patient management and more optimal clinical trial design.

Advisor
Davatzikos, Christos
Date of degree
2024
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation