Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Epidemiology & Biostatistics

First Advisor

Sharon X. Xie


Measurement error and missing data are two phenomena which prevent researchers from observing essential quantities in their studies. Measurement error occurs when data are subject to variability which masks an underlying value. Recognition of measurement error is essential to preventing bias in an analysis, and methods to handle it have been well-developed in recent years. However, in time-to-event analyses, competing risks is another important consideration which can invalidate study results if not properly accounted for. Current methods to accommodate competing risks do not account for measurement error, and, as a result, incur a large amount of bias when using covariates measured with error. We first propose a novel method which combines the intuition of the subdistribution model for competing risks with risk set regression calibration, which corrects for measurement error in Cox regression by recalibrating at each failure time. We show through simulations that the proposed estimator removes bias that occurs when measurement error is ignored. The second part of this dissertation addresses missing outcome data in longitudinal models. While this is a well-studied area of research, some current missing data methods are subject to misspecification, while others are not suited to handle a large amount of missing data. We propose a novel method to account for missing longitudinal outcome data in the situation where some patients have no recorded outcomes. We accomplish this through use of an auxiliary outcome available for all patients, and avoid the pitfall of misspecification by estimating its relationship with the data nonparametrically. We show that this method is more efficient than conventional methods and robust to misspecification. For both proposed methods, we show that the estimators are asymptotically normal, and provide consistent variance estimates. We also show that the estimator for the second method is consistent. We apply both proposed methods to neurodegenerative disease data. Finally, we introduce an R package to implement the first proposed method and make it widely available for regular use.

Included in

Biostatistics Commons