Statistical methods for the identification, evaluation, and classification of non-monotone biomarkers in case-control studies

Lindner, Hanna

Statistical methods for the identification, evaluation, and classification of non-monotone biomarkers in case-control studies

Files

Lindner_upenngdas_0175C_15594.pdf (14.52 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Epidemiology and Biostatistics

Discipline

Biology

Subject

Biomarker
Feature selection
Likelihood ratio
Non-monotone
ROC curve
Test for trend

Copyright date

2022

Permalink

https://repository.upenn.edu/handle/20.500.14332/59742

View all metadata

Author

Lindner, Hanna

Abstract

Biomarker models in case-control settings typically assume a monotone relationship between the biomarker and the outcome, as do frequently used biomarker analytic methods like the ROC curve and the AUC. Despite increased consideration of biomarkers not meeting monotonic assumptions, methods to identify, analyze, and evaluate such biomarkers are sparse. Without such tools, informative non-monotone biomarkers are likely to be misinterpreted or overlooked, excluding them from use in applications like predictive modeling and clinical decision making. Furthermore, the consideration of multiple biomarker types necessitates methods for biomarker classification. We address the limitations of the current literature by 1) introducing a new estimation technique for the diagnostic likelihood ratio (DLR) function, 2) creating hypothesis tests to classify biomarkers according to their relationship with the outcome, and 3) evaluating the performance of different feature selection methods when non-monotone biomarkers are present. Specifically, we recommend the DLR function as an alternative way to interpret and analyze a biomarker, and propose its estimation using the multinomial logistic regression model. Doing so allows for the incorporation of covariates and model-based hypothesis testing of the DLR function. We also propose two hypothesis tests for biomarker classification: one for categorical data that does not require distributional assumptions, while the other assumes the binormal model holds and works with continuous data. For categorical biomarkers we develop a modification of the Cochran-Armitage test for trend. Alternatively for continuous biomarker data, we define a new area-based measure of the ROC curve to measure deviation from the monotone biomarker model, which forms the hypothesis testing procedure. Finally, we consider the large data setting and evaluate the performance of multiple feature selection methods with the goal of recommending methods that are both inclusive of non-monotone biomarkers, and robust to small amounts of noise in the data. This latter property indicates the ability to reproduce findings in separate data as a validation procedure. Statistical performance of the proposed methods is assessed through simulation study, and illustrated with real data applications.

Advisor

Bilker, Warren, B
Gimotty, Phyllis, A

Date of degree

2022

Collection

Dissertations and Theses