Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Epidemiology & Biostatistics

First Advisor

Pamela A. Shaw


Large epidemiologic studies with self-reported or routinely collected electronic health records (EHR) data are frequently being used as cost-effective ways to conduct clinical research, but these types of data are often prone to measurement error. While large epidemiologic studies play a crucial role in understanding the relationship between risk factors and health outcomes, such as disease incidence, these relationships cannot be properly understood unless methods are developed that reduce the bias caused by errors in both exposure variables and time-to-event outcome variables. Furthermore, variance estimates for outcome model regression parameters can be quite large in the presence of complex error-prone exposures and outcomes, yet strategies to improve variance estimation have been given little attention in the measurement error literature. Throughout this dissertation, we address these gaps in the literature by developing methodology that focuses on (1) reducing the bias that occurs from both error-prone exposures and outcomes in large epidemiologic cohort studies with periodic follow-up, (2) improving statistical efficiency by leveraging error-prone, auxiliary data alongside validated outcome data, and (3) considering alternative, better-behaved variance estimation strategies that may be used when techniques for adjusting for measurement error are applied. In Chapter 2, we present a method that combines an approach for addressing errors in event classification variables with regression calibration, a popular technique for addressing exposure error. This method reduces the bias induced by measurement errors in a discrete time-to-event setting. We apply our method to data from the Women’s Health Initiative (WHI) study to evaluate the association between dietary energy and protein and incident diabetes. Chapter 3 develops an approach for incorporating error-prone, auxiliary data into the analysis of an interval-censored time-to-event outcome. Here, the key goal is to improve statistical efficiency in the estimation of exposure-disease associations. We extend our methodology to handle data from a complex survey design and to be used in conjunction with regression calibration. Using this approach, we assess the association between energy and protein and the risk of diabetes in our motivating study, the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). In Chapter 4, we propose a sandwich variance estimator as an approach for accounting for the uncertainty added by using an estimated exposure when regression calibration is applied to adjust for covariate error. This variance approach broadly applies to other two-stage regression settings. We outline a procedure for easily computing the sandwich in standard software and assess its properties through a numerical study and through illustrative data examples from the WHI and HCHS/SOL studies. Our results show that this method may have advantages over commonly applied, resampling-based variance estimation approaches.

Included in

Biostatistics Commons