Preserving Patient Privacy in Modeling Multi-Category Outcomes Across Real-World Data Sources

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology and Biostatistics
Discipline
Biology
Subject
Funder
Grant number
License
Copyright date
2023
Distributor
Related resources
Author
Locke Jr., Kenneth
Contributor
Abstract

Multi-site studies involving data from real-world data sources, such as those from Electronic Health Records (EHR), have been increasingly common in recent years in the study of rare or complex diseases. One challenge of such studies is how data are managed and analyzed in the privacy-preserving setting where individual patient data (IPD) must remain within respective sites. As most statistical analyses require that data be centralized into one site for analysis, this may not be possible when working with sensitive patient information. Distributed algorithms can overcome this obstacle by utilizing only summary level information to model the outcome of interest and obtain results similar to the pooled data analysis. However, some of these algorithms are limited in real-world applications as they either require iterative rounds of communications and/or ignore different levels of heterogeneity between sites. Multi-category outcomes, such as improvement scales and diagnosis codes, are commonly encountered in clinical studies. To date, no communication-efficient distributed algorithms have been developed for modeling ordered and unordered categorical outcomes. In this dissertation, we developed two robust privacy-preserving distributed algorithms for modeling multi-category outcomes in the ordered and unordered category settings that require only two rounds of communication between sites. For our first algorithm, we developed methods under the assumption of homogeneity of data distribution between sites. Our second algorithm concerns the setting where the data distribution is heterogeneous across sites. Through simulations and applied data examples, we demonstrate that both algorithms in ordered and unordered categorical outcome settings are accurate relative to the gold standard analysis conducted over the combined data. Finally, we apply our distributed algorithms to a pediatric multi-site clinical research network (PEDSnet) by modeling clinical outcomes that occur in the two-year postoperative period of Tympanostomy Tube Insertion surgery.

Advisor
Chen, Yong
Landis, J. Richard
Date of degree
2023
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation