A Genome-First Approach To Investigating The Biological And Clinical Relevance Of Exome-Wide Rare Coding Variation Using Electronic Health Record Phenotypes

Joseph Park, University of Pennsylvania

Abstract

Genome-wide association studies (GWAS) have successfully described the roles of common genetic variation on human diseases by analyzing large populations recruited based on a shared phenotype, but the biological and clinical relevance of numerous genes remain incompletely described through these ‘phenotype-first’ methodologies. Much of the unexplained genetic contribution to disease risk and variability in complex traits may belong to the very rare and private spectrum of alleles, a range traditionally ignored by GWAS. Furthermore, the phenotype-first approach is likely to miss unexpected phenotypic consequences of genetic variants, such as those that may not be feasible to study in a phenotype-first approach due to rarity of the condition. The Penn Medicine BioBank, a healthcare system-based database of genotype, whole-exome sequencing, and electronic health record data, allows for an unbiased, ‘genome-first’ approach to describing the relationships between genetic variants and human disease traits captured in the clinical setting. Through ‘gene burden’ tests that interrogate the cumulative effects of multiple rare and private variants in a gene that are predicted to affect gene function, this dissertation aims to characterize the clinical manifestations of diseases and traits caused by rare, predicted loss-of-function and predicted deleterious missense variants on an exome-wide and/or phenome-wide scale. These analyses uncover previously unsuspected medical and biological consequences of loss-of-function variants in multiple genes. In summary, this dissertation will investigate the biological and clinical relevance of disease-associated genes by investigating the association of rare coding variation found in whole-exome sequencing with phenotypes derived from the EHR.