Enhancing Electronic Health Record Data For Population Health Studies

Sherrie Xie, University of Pennsylvania


Electronic health records (EHRs) offer convenient and low-cost access to large volumes of longitudinal data for diverse, real-world populations, which have made them an invaluable resource for biomedical research. However, information is collected in EHRs for clinical and administrative uses and repurposing this data for research comes with unique challenges, including the limited scope of exposure-related variables. The objective of this dissertation was to examine methods to augment the scope of EHR data by integrating it with external data sources, including publicly available data on social and environmental factors, as well as data from personal sensing. First, we studied how linking EHR data with area-based measures of socioeconomic status (SES) can impact the results of epidemiologic studies. We showed that because individual-level SES measures do not always strongly correlate with area-based measures, the use of area-based measures can result in residual confounding by individual SES on the exposure-outcome association under study. Second, we examined whether the integration of geospatial features with EHR data can improve the prediction of asthma and chronic obstructive pulmonary disease exacerbations beyond the use of EHR data alone and determined that geospatial features have predictive value when linked to patient data. Third, we linked geospatially varying data on neighborhood SES and residential greenness to EHR data for encounters in which an animal-related disease condition was documented to identify risk factors for animal-related illness and injury and found that residential greenness was associated with an increased risk of Lyme disease and tick bite and decreased risk of allergic rhinitis due to animal dander. In addition, we illustrated how spatial regression methods, such as autologistic regression, that model spatial autocorrelation explicitly may be better suited for the study of spatially correlated exposure variables than nonspatial methods. Finally, we determined through qualitative interviews that the use of portable pollution sensors was generally acceptable to adults with asthma and demonstrated through trials deploying sensors their utility for capturing personalized exposure information at high spatiotemporal resolution. Pending improvements to make devices more amenable for general use, portable sensors could greatly improve the capture of exposure information that can be linked to EHR data.