Converting Neuroimaging Big Data to information: Statistical Frameworks for interpretation of Image Driven Biomarkers and Image Driven Disease Subtyping

Gaonkar, Bilwaj Krishnanand

Converting Neuroimaging Big Data to information: Statistical Frameworks for interpretation of Image Driven Biomarkers and Image Driven Disease Subtyping

Files

Gaonkar_upenngdas_0175C_11740.pdf (12.81 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Bioengineering

Subject

Alzheimer's disease
biostatistics
heterogeneity analysis
machine learning
support vector machines
Biomedical
Computer Sciences
Statistics and Probability

Copyright date

2016-11-29T00:00:00-08:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/28573

View all metadata

Author

Gaonkar, Bilwaj Krishnanand

Abstract

Large scale clinical trials and population based research studies collect huge amounts of neuroimaging data. Machine learning classifiers can potentially use these data to train models that diagnose brain related diseases from individual brain scans. In this dissertation we address two distinct challenges that beset a wider adoption of these tools for diagnostic purposes. The first challenge that besets the neuroimaging based disease classification is the lack of a statistical inference machinery for highlighting brain regions that contribute significantly to the classifier decisions. In this dissertation, we address this challenge by developing an analytic framework for interpreting support vector machine (SVM) models used for neuroimaging based diagnosis of psychiatric disease. To do this we first note that permutation testing using SVM model components provides a reliable inference mechanism for model interpretation. Then we derive our analysis framework by showing that under certain assumptions, the permutation based null distributions associated with SVM model components can be approximated analytically using the data themselves. Inference based on these analytic null distributions is validated on real and simulated data. p-Values computed from our analysis can accurately identify anatomical features that differentiate groups used for classifier training. Since the majority of clinical and research communities are trained in understanding statistical p-values rather than machine learning techniques like the SVM, we hope that this work will lead to a better understanding SVM classifiers and motivate a wider adoption of SVM models for image based diagnosis of psychiatric disease. A second deficiency of learning based neuroimaging diagnostics is that they implicitly assume that, `a single homogeneous pattern of brain changes drives population wide phenotypic differences'. In reality it is more likely that multiple patterns of brain deficits drive the complexities observed in the clinical presentation of most diseases. Understanding this heterogeneity may allow us to build better classifiers for identifying such diseases from individual brain scans. However, analytic tools to explore this heterogeneity are missing. With this in view, we present in this dissertation, a framework for exploring disease heterogeneity using population neuroimaging data. The approach we present first computes difference images by comparing matched cases and controls and then clusters these differences. The cluster centers define a set of deficit patterns that differentiates the two groups. By allowing for more than one pattern of difference between two populations, our framework makes a radical departure from traditional tools used for neuroimaging group analyses. We hope that this leads to a better understanding of the processes that lead to disease and also that it ultimately leads to improved image based disease classifiers.

Advisor

Christos Davatzikos

Date of degree

2015-01-01

Collection

Dissertations and Theses