Statistical Methods For Censored And Missing Data In Survival And Longitudinal Analysis

Suttner, Leah Helene

Statistical Methods For Censored And Missing Data In Survival And Longitudinal Analysis

Files

Suttner_upenngdas_0175C_13816.pdf (1.4 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Epidemiology & Biostatistics

Subject

Biostatistics

Copyright date

2019-10-23T20:19:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/30553

View all metadata

Author

Suttner, Leah Helene

Abstract

Missing or incomplete data is a nearly ubiquitous problem in biomedical research studies. If the incomplete data are not appropriately addressed, it can lead to biased, inefficient estimation that can impact the conclusions of the study. Many methods for dealing with missing or incomplete data rely on parametric assumptions that can be difficult or impossible to verify. Here we propose semiparametric and nonparametric methods to deal with data in longitudinal studies that are missing or incomplete by design of the study. We apply these methods to data from Parkinson's disease dementia studies. First, we propose a quantitative procedure for designing appropriate follow-up schedules in time-to-event studies to address the problem of interval-censored data at the study design stage. We propose a method for generating proportional hazards data with an unadjusted survival similar to that of historical data. Using this data generation process we conduct simulations to evaluate the bias in estimating hazard ratios using Cox regression models under various follow-up schedules to guide the selection of follow-up frequency. Second, we propose a nonparametric method for longitudinal data in which a covariate is only measured for a subset of study subjects, but an informative auxiliary variable is available for everyone. We use empirical and kernel density estimates to obtain nonparametric density estimates of the conditional distribution of the missing data given the observed. We derive the asymptotic distribution of the estimator for time-varying missing covariates as well as discrete or continuous auxiliary variables and show that it is consistent and asymptotically normally distributed. Through simulations we show that our estimator has good finite sample properties and is more efficient than the complete case estimator. Finally, we provide an R package to implement the method.

Advisor

Sharon X. Xie

Date of degree

2019-01-01

Collection

Dissertations and Theses