Fair and Generalizable Machine Learning for Neuroimaging
Degree type
Graduate group
Discipline
Electrical Engineering
Subject
Domain generalization
Fairness
Machine leanring
Medical imaging
Robustness
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Machine learning has revolutionized medical imaging research, offering powerful tools for analysis and diagnosis. However, the transition from research to real-world clinical applications faces significant challenges, primarily due to domain shift. This phenomenon occurs when the distribution of data in the deployment environment differs from that used during training, leading to degraded model performance. Additionally, the inherent heterogeneity of medical data exposes machine learning-based diagnostic models to various biases, including those related to gender, age group, race, scanner type, and acquisition protocols. This thesis addresses these critical issues through a comprehensive exploration of domain adaptation techniques, bias mitigation strategies, and performance optimization methods for medical imaging applications. We begin by arguing for the necessity of domain adaptation to achieve optimal performance across diverse subgroups in clinical settings. To this end, we develop a novel unsupervised adaptation algorithm that does not require ground-truth labels from the unseen domain, making it particularly valuable in medical contexts where labeled data for new domains can be scarce or challenging to obtain. In the era of deep learning and representation learning, we also investigate the continued relevance of handcrafted imaging features, particularly in brain imaging applications. Our research demonstrates how these traditional features can complement modern machine learning approaches, potentially enhancing model interpretability and performance.A significant portion of our work focuses on mitigating biases in machine learning-based diagnostic models. We show that through rigorous data pre-processing techniques and well-constructed model architectures, it is possible to develop more equitable and reliable diagnostic tools. Our findings emphasize the importance of thoughtful data handling and model design in creating fair and robust systems. Furthermore, we explore the benefits of leveraging multi-source data—information from various datasets or institutions—in improving both model performance consistency and precision. This approach helps capture a wider range of data distributions, potentially leading to models that generalize better across diverse populations and clinical settings. To address the challenge of adapting models to new, unseen data distributions, we introduce a weighted empirical risk minimization algorithm. This method allows for fine-tuning models using only a small number of samples from the target domain, making it practical for real-world deployments where extensive new data collection may not be feasible.Throughout the thesis, we demonstrate our empirical results using large-scale, diverse-population brain imaging datasets across various clinical tasks. This comprehensive evaluation provides strong evidence for the effectiveness of our proposed methods in real-world scenarios, spanning different diagnostic applications and patient populations. This thesis contributes to the development of more robust, unbiased, and adaptable machine learning models for medical imaging. By addressing critical challenges in translating research to clinical practice, our work aims to improve the reliability and equity of AI-driven diagnostic tools, ultimately enhancing patient care across diverse populations and healthcare settings.
Advisor
Chaudhari, Pratik