Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Epidemiology & Biostatistics

First Advisor

Yong Chen


The growth of availability and variety of healthcare data sources has provided unique opportunities for data integration and evidence synthesis, which can potentially accelerate knowledge discovery and enable better clinical decision making. However, many practical and technical challenges, such as data privacy, high-dimensionality and heterogeneity across different datasets, remain to be addressed. In Chapters 1-3, we develop several methods for effective integration of electronic health records (EHRs) and other healthcare datasets. We develop communication-efficient distributed algorithms for joint analyses of multiple datasets without the need of sharing patient-level data. Our algorithms do not require iterative communication across sites, and are able to account for heterogeneity across different datasets. We provide theoretical guarantees for the performance of our algorithms, and examples of implementing the algorithms to real world clinical research networks, including the observational health data sciences and informatics (OHDSI) and the national patient-centered clinical research networks (PCORnet). In Chapter 4, we propose a novel bilinear regression model for linking EHR with genetic or imaging data, which incorporates the low-rank and sparse structure of the association between high-dimensional covariates and outcomes. We develop an iterative algorithm to solve the non-convex optimization in the parameter estimation, and a simultaneous hypothesis testing procedure with theoretical guarantees of false discovery rate control. Our method is applied to a multi-view brain network analysis for Parkinson's Disease.


Available to all on Saturday, June 10, 2023

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Biostatistics Commons