MISSINGNESS AND EQUITY OF CLINICAL MODEL PREDICTIVE PERFORMANCE: CONSIDERING THE SOCIAL CONSTRUCTION OF EHR DATA
Degree type
Graduate group
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
ABSTRACT MISSINGNESS AND EQUITY OF CLINICAL MODEL PREDICTIVE PERFORMANCE: CONSIDERING THE SOCIAL CONSTRUCTION OF EHR DATA Stephanie Teeple Scott Halpern Understanding the equity impacts of artificial intelligence (AI) for health is a multifactorial, interdisciplinary challenge. One piece of this challenge is examining inequity in the predictive performance of models intended for clinical practice. Social forces (e.g., racism) fundamentally shape health, health care, and health data that such models are trained on. One way social patterning of EHR data manifests is via missing data (e.g., Black patients are systematically underdiagnosed for a variety of conditions compared to non-Hispanic white patients). This dissertation examines the effects of missing data on predictive performance of several models already deployed or being piloted in clinical practice. It focuses specifically on harms to patients racialized as Black due to the dominance of anti-Black racism in healthcare and other health-impacting institutions in the United States. First, we demonstrate a method for empirically comparing predictive performance across subgroups of patients, leveraging a nonparametric bootstrap approach. Then, using that method of comparison, we investigate whether observed missingness in the problem list section of the EHR differentially impacts an emergency department triage model’s predictions for Black patients versus non-Hispanic white patients. Finally, we examine whether different methods for addressing missing laboratory data (e.g., missingness indicators, multiple imputation, pattern submodels) differentially impacts predictions of a mortality risk model for palliative care. We found that when problem list missingness was reduced, the triage model’s performance improved for both Black and non-Hispanic white patients. However, the magnitude of this improvement was greater for white patients, suggestive of differential missing data generation processes by race (shaped by racism). Next, none of the methods to address missingness eliminated existing disparities in the palliative care model’s predictions. Moreover, the missing indicator method resulted in predictive disparities of greatest magnitude (in this single-site study), which would result in more palliative care resources directed to white patients. The social nature of EHR data matters for predictive performance equity. Future work will examine structural determinants (e.g., financial arrangements, governance practices) that broadly impact AI healthcare applications and health equity.