Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Epidemiology & Biostatistics

First Advisor

Pamela A. Shaw


Biomedical studies are increasingly relying on electronic health records (EHR) as either the sole or supplementary source of data. While these data sources have enormous potential to support the discovery of associations between exposures and disease risk, they are subject to measurement error, leading to bias in estimates of effects of interest. Covariate measurement error has been well studied in the literature, with published work spanning descriptions of its impact as well as methods to address it; however, errors in the outcome has not received as much attention. Furthermore, the error found in EHR data often involves errors in both covariates and a failure-time outcome that can be correlated. In this dissertation, we address these gaps by developing methodology in the paradigm of the Cox model for: (1) correlated errors in the time-to-event and covariate, (2) event-indicator misclassification as well as correlated time-to-event and covariate error, and (3) multiplicative error in the time-to-event. In Chapter 2, we develop two classes of estimators, regression calibration (RC) and generalized raking, to address the bias in Cox regression coefficients resulting from correlated errors in the time-to-event and covariate of interest. The RC estimators have lower relative MSE in moderate signal and high censoring settings; however, they are biased for the Cox model. The raking estimators are consistent, require no explicit modeling of the error structure, and have lower relative MSE for many error settings. In Chapter 3, we develop raking estimators for error settings involving misclassification by constructing auxiliary variables utilizing multiple imputation. We provide rationale for why the previously proposed raking estimators can be expected to be inefficient in the presence of event-indicator misclassification and demonstrate that the proposed raking estimators are more efficient in this setting. In Chapter 4, we compare the performance of the Cox and Weibull AFT models in error settings with random multiplicative time-to-event error. In addition, we develop an extension of the SIMEX method to correct the bias in hazard ratio estimates from the Cox model under multiplicative time-to-event error. We illustrate the proposed methods in the three chapters by applying them to observational EHR data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.

Included in

Biostatistics Commons