Missing Data and Measurement Error Methods for Left-Truncated Survival Data
Degree type
Graduate group
Discipline
Subject
Measurement Error
Missing Data
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Left truncation arises in time-to-event studies when subjects come under observation after the onset time (time origin). The resulting sample has biased outcome and covariate distributions. Survival studies may also suffer from missing covariate data, which often arise when this information is difficult to collect due to patient burden or resource limitations. Some widely used missing data strategies, such as multiple imputation (MI) and augmented inverse probability weighting (AIPW), rely on modeling the distribution of the missing covariate, which may be inaccurate with a biased, left-truncated sample. Through simulation studies, we explore the performance of these methods in estimating Cox regression parameters under a variety of truncation and missing data scenarios. Furthermore, we propose a novel AIPW estimation procedure that accounts for the effect of left truncation on the missing covariate distribution. By correctly estimating this distribution, our method is more robust to model misspecification than the existing approaches considered. We have developed the asymptotic theory of the proposed method. In the second part of this dissertation, we consider length-biased sampling, a special case of left truncation. In some study designs, the onset time may be inaccurately measured. At study enrollment, patients may be asked the time of the onset (e.g., disease symptom onset), which may have occurred years prior. Thus, patient-reported times may suffer from recall bias. Error-prone onset timing causes the survival outcome to be similarly measured with error. Under an additive measurement error model, the residual time is an error-free quantity related to the survival time of interest. We develop a maximum likelihood procedure that leverages the residual time distribution to estimate the regression coefficients of a parametric survival model. Notably, this method does not require assumptions regarding the measurement error distribution, which may be difficult to verify in practice. We validate the use of our method under multiple measurement error distributions through simulation studies. Finally, we apply our proposed methods to a study examining biofluid and genetic biomarkers for risk of cognitive impairment in Parkinson’s disease.