Measurement Error And Missing Data Methods In Biomarker Research

Caswell, Carrie

Measurement Error And Missing Data Methods In Biomarker Research

Files

Caswell_upenngdas_0175C_14068.pdf (588.35 KB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Epidemiology & Biostatistics

Subject

competing risks
longitudinal analysis
measurement error
missing data
mixed-effects models
survival analysis
Biostatistics

Copyright date

2021-08-31T20:20:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/31319

View all metadata

Author

Caswell, Carrie

Abstract

Measurement error and missing data are two phenomena which prevent researchers from observing essential quantities in their studies. Measurement error occurs when data are subject to variability which masks an underlying value. Recognition of measurement error is essential to preventing bias in an analysis, and methods to handle it have been well-developed in recent years. However, in time-to-event analyses, competing risks is another important consideration which can invalidate study results if not properly accounted for. Current methods to accommodate competing risks do not account for measurement error, and, as a result, incur a large amount of bias when using covariates measured with error. We first propose a novel method which combines the intuition of the subdistribution model for competing risks with risk set regression calibration, which corrects for measurement error in Cox regression by recalibrating at each failure time. We show through simulations that the proposed estimator removes bias that occurs when measurement error is ignored. The second part of this dissertation addresses missing outcome data in longitudinal models. While this is a well-studied area of research, some current missing data methods are subject to misspecification, while others are not suited to handle a large amount of missing data. We propose a novel method to account for missing longitudinal outcome data in the situation where some patients have no recorded outcomes. We accomplish this through use of an auxiliary outcome available for all patients, and avoid the pitfall of misspecification by estimating its relationship with the data nonparametrically. We show that this method is more efficient than conventional methods and robust to misspecification. For both proposed methods, we show that the estimators are asymptotically normal, and provide consistent variance estimates. We also show that the estimator for the second method is consistent. We apply both proposed methods to neurodegenerative disease data. Finally, we introduce an R package to implement the first proposed method and make it widely available for regular use.

Advisor

Sharon X. Xie

Date of degree

2019-01-01

Collection

Dissertations and Theses