Statistical methods for the identification, evaluation, and classification of non-monotone biomarkers in case-control studies

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology and Biostatistics
Discipline
Biology
Subject
Biomarker
Feature selection
Likelihood ratio
Non-monotone
ROC curve
Test for trend
Funder
Grant number
License
Copyright date
2022
Distributor
Related resources
Author
Lindner, Hanna
Contributor
Abstract

Biomarker models in case-control settings typically assume a monotone relationship between the biomarker and the outcome, as do frequently used biomarker analytic methods like the ROC curve and the AUC. Despite increased consideration of biomarkers not meeting monotonic assumptions, methods to identify, analyze, and evaluate such biomarkers are sparse. Without such tools, informative non-monotone biomarkers are likely to be misinterpreted or overlooked, excluding them from use in applications like predictive modeling and clinical decision making. Furthermore, the consideration of multiple biomarker types necessitates methods for biomarker classification. We address the limitations of the current literature by 1) introducing a new estimation technique for the diagnostic likelihood ratio (DLR) function, 2) creating hypothesis tests to classify biomarkers according to their relationship with the outcome, and 3) evaluating the performance of different feature selection methods when non-monotone biomarkers are present. Specifically, we recommend the DLR function as an alternative way to interpret and analyze a biomarker, and propose its estimation using the multinomial logistic regression model. Doing so allows for the incorporation of covariates and model-based hypothesis testing of the DLR function. We also propose two hypothesis tests for biomarker classification: one for categorical data that does not require distributional assumptions, while the other assumes the binormal model holds and works with continuous data. For categorical biomarkers we develop a modification of the Cochran-Armitage test for trend. Alternatively for continuous biomarker data, we define a new area-based measure of the ROC curve to measure deviation from the monotone biomarker model, which forms the hypothesis testing procedure. Finally, we consider the large data setting and evaluate the performance of multiple feature selection methods with the goal of recommending methods that are both inclusive of non-monotone biomarkers, and robust to small amounts of noise in the data. This latter property indicates the ability to reproduce findings in separate data as a validation procedure. Statistical performance of the proposed methods is assessed through simulation study, and illustrated with real data applications.

Advisor
Bilker, Warren, B
Gimotty, Phyllis, A
Date of degree
2022
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation