Methods For Bias Reduction In Evidence-Based Medicine

Arielle Kimberly Marks-Anglin, University of Pennsylvania


Evidence-based medicine (EBM) emerged as a movement to ground clinical practice in empirical research to optimize patient care and outcomes. The exponential growth in clinical studies that ensued along with the adoption of electronic health records (EHRs) created a cycle of evidence generation, synthesis, translation, and data collection that continues to guide standard of care. The success of EBM hinges on the reproducibility and validity of the research produced. However, systemic bias at any stage can lead to incorrect inference, negatively impacting patient care. In this dissertation, we explore three sources of bias that can undermine EBM, including publication bias in meta-analyses (evidence synthesis), differential outcome misclassification in EHR data (impacting evidence generation), and selection bias in EHR-based studies (evidence translation). For publication bias, we develop an EM-algorithm for selection model estimation in the expanded network meta-analysis (NMA) framework. We show that it substantially reduces bias due to selective publication, while allowing for a maximally flexible working model for heterogeneous data. We apply it to an NMA of antiplatlet therapies for preventing vascular occlusion. For differential misclassification, we propose two surrogate-assisted sampling schemes for cost-effective validation of EHR outcomes. The sampling weights prioritize selection of patients most informative for the model of interest, leading to improved precision of model estimates relative to simple random sampling under measurement constraints. We study their performance under multiple data distributions and offer recommendations for the optimal application of each weighting scheme. We apply our methods to the study of second breast cancer events among women diagnosed with primary stages I-IIIB invasive breast cancer. Finally, we expand the framework of outcome validation to account for patient selection from target populations into EHR cohorts. Combining our efficient sampling designs with inverse probability of selection weighting, we improve the generalizability of results derived from validated subsamples of EHR data. We study a variety of mechanisms for patient selection and the bias-variance tradeoff when constructing sampling weights that account for selection bias. We then use our methods to extend inference from a colon cancer recurrence EHR dataset to the larger U.S. population diagnosed with stages I-IIIA colon cancer.