Survival Analysis With Uncertain Endpoints Using an Internal Validation Subsample

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
Measurement Error
Missing Data
Study Design
Survival Analysis
Validation Sample
Biostatistics
Funder
Grant number
License
Copyright date
2015-11-16T20:14:00-08:00
Distributor
Related resources
Contributor
Abstract

When a true survival endpoint cannot be assessed for some subjects, an alternative endpoint that measures the true endpoint with error may be collected, which often occurs when the true endpoint is too invasive or costly to obtain. We develop nonparametric and semiparametric estimated likelihood functions that incorporate both uncertain endpoints available for all participants and true endpoints available for only a subset of participants. We propose maximum estimated likelihood estimators of the discrete survival function of time to the true endpoint and of a hazard ratio representing the effect of a binary or continuous covariate assuming a proportional hazards model. We show that the proposed estimators are consistent and asymptotically normal and develop the analytical forms of the variance estimators. Through extensive simulations, we also show that the proposed estimators have little bias compared to the naïve estimator, which uses only uncertain endpoints, and are more efficient with moderate missingness compared to the complete-case estimator, which uses only available true endpoints. We illustrate the proposed method by estimating the risk of developing Alzheimer's disease using data from the Alzheimer's Disease Neuroimaging Initiative. Using our proposed semiparametric estimator, we develop optimal study design strategies to compare survival across treatment groups for a new trial with these data characteristics. We demonstrate how to calculate the optimal number of true events in the validation set with desired power using simulated data when assuming the baseline distribution of the true event, effect size, correlation between outcomes, and proportion of true outcomes that are missing can be estimated from pilot studies. We also propose a sample size formula that does not depend on baseline distribution of the true event and show that power calculated by the formula matches well with simulation based results. Using results from a Ginkgo Evaluation of Memory study, we calculate the number of true events in the validation set that would need to be observed for new studies comparing development of Alzheimer's disease among those with and without antihypertensive use, as well as the total number of subjects and number in the validation set to be recruited for these new trials.

Advisor
Sharon X. Xie
Date of degree
2014-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation