Semiparametric Approaches To Developing Models For Predicting Binary Outcomes Through Data And Information Integration

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
Biostatistics
Funder
Grant number
License
Copyright date
2018-02-23T20:17:00-08:00
Distributor
Related resources
Contributor
Abstract

We developed statistical methods for evaluating the added value of biomarkers for predicting binary outcomes when biomarker data has limited availability. In the first project, we considered a cost effective study design called “two-phase study”, where data on the outcome and established risk predictors was collected for all study subjects in Phase I while biomarkers were measured only for a judiciously selected subset in Phase II. Using a logistic regression model to describe the relationship between the binary outcome and risk predictors, we developed three approaches to estimating the risk distribution and summary measures of predictive accuracy. We showed that all three estimators were consistent and asymptotically normally distributed, and compared the efficiency and robustness of the three methods through extensive simulation studies and application to an ongoing biomarker study of Gestational Diabetes. We also developed a novel sampling strategy for selecting Phase II subjects towards improved efficiency for estimating measures of predictive accuracy. In the second project, we developed a statistical method for alleviating the challenge of lack of independent data to validate biomarkers for prediction, focusing on model calibration. When a well-calibrated model with only standard predictors exists, we proposed to calibrate the new model to the existing model at the stage of model development. With data collected under a case-control study design, we developed a novel constrained maximum likelihood approach to fitting logistic regression models that brought this idea to fruition. We developed large sample theory for this method, and performed extensive simulation studies to assess the impact of constraints on the odds ratio parameter estimates. We applied our method to analyze a case-control study of breast cancer nested within the Breast Cancer Detection and Demonstration Project to evaluate the added value of mammographic density for predicting the 5-year risk of breast cancer. In the third project, we extended the statistical method developed in the second project to accommodate the cross-sectional study design. By simulation studies and the analysis of Gestational Diabetes, we demonstrated that our method ensured that the model was well calibrated.

Advisor
Jinbo Chen
Date of degree
2017-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation