Cancer Absolute Risk Projection with Incomplete Predictor Variables

Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Absolute risk prediction
Breast cancer
Predictive accuracy
Semi-parametric maximum likelihood
Stratified case-control study
Two phase design
Grant number
Copyright date
Related resources

A popular approach to projecting cancer absolute risk is to integrate a relative hazard function of predictors with hazard rates obtained from different sources, where the relative hazard function is often approximated by an odds ratio function. To assess added values of candidate risk predictors, it is very common that data for standard risk predictors is fully available from a frequency-matched case-control study, but that of candidate predictors is available only for a subset of cases and controls. In the first project, we developed statistical measures for quantifying predictive accuracy of cancer absolute risk prediction models, accommodating incomplete predictor variables. We particularly focused on a measure that is useful for evaluating efficiency of model-based cancer screening, the proportion of cases that can be captured by screening only people with high projected risk. In the second project, using a logistic regression model to describe the relationship between cancer status and risk predictors, we developed a novel semiparametric maximum likelihood approach that accommodates incomplete predictor data under rare disease approximation for the estimation of odds ratio parameters and the distribution of candidate predictors. Through theoretical and simulation studies, we showed that our estimator is consistent with an asymptotically normal distribution and has improved statistical efficiency. In the third project, we applied the statistical methods developed in the first two to evaluate the added values of percent mammographic density and breast cancer risk SNPs in breast cancer absolute risk projection. Our results showed that the two sets of predictors had similar added values and can lead to more efficient model-based screening for breast cancer. In the fourth project, we applied the semiparametric maximum likelihood method to a family-supplemented study design that we proposed to address survival bias in case-control genetic association studies.

Jinbo Chen
Date of degree
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher DOI
Journal Issue
Recommended citation