Statistical Methods For Allele-Specific Expression Analysis By Rna Sequencing

Jiaxin Fan, University of Pennsylvania


Allele-specific expression (ASE) analysis, which quantifies the degree of allelic expression imbalance between two alleles in a diploid individual, has become a powerful tool for identifying gene expression variations that underlie phenotypic differences among individuals. ASE is driven by cis-regulatory variants located near a gene. Since the two alleles used to measure ASE are expressed in the same cellular environment and genetic background, they can serve as internal controls and eliminate the influence of trans-acting genetic and environmental factors. Existing ASE detection methods analyze one individual at a time, therefore not only wasting shared information across individuals, but also posing a challenge for interpretation of results across individuals. To overcome these limitations, my dissertation focused on developing statistical methods for ASE analysis using RNA sequencing (RNA-seq) and single-cell RNA-seq (scRNA-seq) data. In the first project, I developed ASEP, a mixture model with subject-specific random effect to detect gene-level ASE across individuals in a population under one condition, as well as ASE difference between two conditions. Since ASE patterns may vary across cell types, to better identify cellular targets of disease, in the second project, I developed BSCET to characterize cell-type-specific ASE in bulk RNA-seq data by incorporating cell type composition information inferred from external scRNA-seq data. By modeling covariate effect, BSCET can also detect genes whose cell-type-specific ASE are associated with clinical factors. Since having accurate cell type proportion estimate is critical for BSCET, in order to accurately estimate cell type proportions from heterogeneous bulk tissue RNA-seq samples, in the third project, I developed MuSiC2, an iterative weighted non-negative least squares regression method, to deconvolve cell types in multi-condition bulk tissue RNA-seq data using scRNA-seq data from a single condition as reference. With the growing popularity of RNA-seq and scRNA-seq, I believe methods developed in my dissertation will provide a set of valuable tools for transcriptomics research. Results from analyses using these tools will offer insights on gene regulation and elucidate its relationship to human diseases.