Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Epidemiology & Biostatistics

First Advisor

Yong Chen


Evidence-based medicine (EBM) emerged as a movement to ground clinical practice in empirical research to optimize patient care and outcomes. The exponential growth in clinical studies that ensued along with the adoption of electronic health records (EHRs) created a cycle of evidence generation, synthesis, translation, and data collection that continues to guide standard of care. The success of EBM hinges on the reproducibility and validity of the research produced. However, systemic bias at any stage can lead to incorrect inference, negatively impacting patient care. In this dissertation, we explore three sources of bias that can undermine EBM, including publication bias in meta-analyses (evidence synthesis), differential outcome misclassification in EHR data (impacting evidence generation), and selection bias in EHR-based studies (evidence translation). For publication bias, we develop an EM-algorithm for selection model estimation in the expanded network meta-analysis (NMA) framework. We show that it substantially reduces bias due to selective publication, while allowing for a maximally flexible working model for heterogeneous data. We apply it to an NMA of antiplatlet therapies for preventing vascular occlusion. For differential misclassification, we propose two surrogate-assisted sampling schemes for cost-effective validation of EHR outcomes. The sampling weights prioritize selection of patients most informative for the model of interest, leading to improved precision of model estimates relative to simple random sampling under measurement constraints. We study their performance under multiple data distributions and offer recommendations for the optimal application of each weighting scheme. We apply our methods to the study of second breast cancer events among women diagnosed with primary stages I-IIIB invasive breast cancer. Finally, we expand the framework of outcome validation to account for patient selection from target populations into EHR cohorts. Combining our efficient sampling designs with inverse probability of selection weighting, we improve the generalizability of results derived from validated subsamples of EHR data. We study a variety of mechanisms for patient selection and the bias-variance tradeoff when constructing sampling weights that account for selection bias. We then use our methods to extend inference from a colon cancer recurrence EHR dataset to the larger U.S. population diagnosed with stages I-IIIA colon cancer.


Available to all on Friday, August 09, 2024

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Biostatistics Commons