Statistical Methods For Censored And Missing Data In Survival And Longitudinal Analysis

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
Biostatistics
Funder
Grant number
License
Copyright date
2019-10-23T20:19:00-07:00
Distributor
Related resources
Contributor
Abstract

Missing or incomplete data is a nearly ubiquitous problem in biomedical research studies. If the incomplete data are not appropriately addressed, it can lead to biased, inefficient estimation that can impact the conclusions of the study. Many methods for dealing with missing or incomplete data rely on parametric assumptions that can be difficult or impossible to verify. Here we propose semiparametric and nonparametric methods to deal with data in longitudinal studies that are missing or incomplete by design of the study. We apply these methods to data from Parkinson's disease dementia studies. First, we propose a quantitative procedure for designing appropriate follow-up schedules in time-to-event studies to address the problem of interval-censored data at the study design stage. We propose a method for generating proportional hazards data with an unadjusted survival similar to that of historical data. Using this data generation process we conduct simulations to evaluate the bias in estimating hazard ratios using Cox regression models under various follow-up schedules to guide the selection of follow-up frequency. Second, we propose a nonparametric method for longitudinal data in which a covariate is only measured for a subset of study subjects, but an informative auxiliary variable is available for everyone. We use empirical and kernel density estimates to obtain nonparametric density estimates of the conditional distribution of the missing data given the observed. We derive the asymptotic distribution of the estimator for time-varying missing covariates as well as discrete or continuous auxiliary variables and show that it is consistent and asymptotically normally distributed. Through simulations we show that our estimator has good finite sample properties and is more efficient than the complete case estimator. Finally, we provide an R package to implement the method.

Advisor
Sharon X. Xie
Date of degree
2019-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation