STATISTICAL METHODS FOR ANALYSIS OF HIGH-DIMENSIONAL NEUROIMAGING DATA
Degree type
Graduate group
Discipline
Subject
federated learning
image harmonization
intermodal coupling
multiple sclerosis
neuroimaging
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
The increasing accessibility of neuroimaging data promises new insights into the human brain. However, the complexity of neuroimaging data poses significant challenges. This dissertation develops statistical tools to capture neuroimaging patterns and address these challenges. First, we propose an approach for capturing unique neuroimaging information in multi-modal datasets. Information within each modality can be analyzed by existing methods; however, additional information is present in the relationship between these modalities, which we call intermodal coupling (IMCo). We develop PCA-based intermodal coupling (pIMCo), a method which summarizes voxel-wise covariance structures between two or more imaging modalities. This method may enhance our understanding of relationships between brain characteristics and improve predictive models. Our second contribution serves to increase reproducibility and generalizability in neuroimaging data analyses. Neuroimaging data collected from multiple batches, such as different scanners, are increasingly necessary to obtain large sample sizes and discover small effects. However, significant confounding is present in this data due to batch-induced technical variation, called batch effects. We develop DeepComBat, a deep learning method for removing multivariate batch effects in neuroimaging data and show it outperforms existing methods. Third, we develop an automated deep learning segmentation algorithm for lesion-based biomarkers used for diagnosis and prognosis of multiple sclerosis (MS), a demyelinating neuroinflammatory disorder. Specifically, we use multi-modal MRI to simultaneously segment and classify MS lesions as classical MS lesions, paramagnetic rim lesions, or central vein sign lesions. Since manual classification of these lesion biomarkers is time-consuming and rater-dependent, automated segmentations may allow for translation of these biomarkers to clinical practice and for advancement of quantitative lesion-based research. Finally, we propose a federated learning extension for Generalized Additive Models for Location, Scale and Shape (GAMLSS). This model is commonly used to estimate normative reference charts but requires large sample sizes to fit – such sample sizes can be challenging to obtain across hospitals or datasets due to privacy-protecting regulations. This extension, which we call distributed GAMLSS (dGAMLSS), allows for fitting of GAMLSS models across sites, even when private data cannot be shared. Collectively, these methods broaden the range of questions that neuroimaging data can address.