Statistical Methods For Allele-Specific Expression Analysis By Rna Sequencing

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
Biostatistics
Funder
Grant number
License
Copyright date
2021-08-31T20:20:00-07:00
Distributor
Related resources
Author
Fan, Jiaxin
Contributor
Abstract

Allele-specific expression (ASE) analysis, which quantifies the degree of allelic expression imbalance between two alleles in a diploid individual, has become a powerful tool for identifying gene expression variations that underlie phenotypic differences among individuals. ASE is driven by cis-regulatory variants located near a gene. Since the two alleles used to measure ASE are expressed in the same cellular environment and genetic background, they can serve as internal controls and eliminate the influence of trans-acting genetic and environmental factors. Existing ASE detection methods analyze one individual at a time, therefore not only wasting shared information across individuals, but also posing a challenge for interpretation of results across individuals. To overcome these limitations, my dissertation focused on developing statistical methods for ASE analysis using RNA sequencing (RNA-seq) and single-cell RNA-seq (scRNA-seq) data. In the first project, I developed ASEP, a mixture model with subject-specific random effect to detect gene-level ASE across individuals in a population under one condition, as well as ASE difference between two conditions. Since ASE patterns may vary across cell types, to better identify cellular targets of disease, in the second project, I developed BSCET to characterize cell-type-specific ASE in bulk RNA-seq data by incorporating cell type composition information inferred from external scRNA-seq data. By modeling covariate effect, BSCET can also detect genes whose cell-type-specific ASE are associated with clinical factors. Since having accurate cell type proportion estimate is critical for BSCET, in order to accurately estimate cell type proportions from heterogeneous bulk tissue RNA-seq samples, in the third project, I developed MuSiC2, an iterative weighted non-negative least squares regression method, to deconvolve cell types in multi-condition bulk tissue RNA-seq data using scRNA-seq data from a single condition as reference. With the growing popularity of RNA-seq and scRNA-seq, I believe methods developed in my dissertation will provide a set of valuable tools for transcriptomics research. Results from analyses using these tools will offer insights on gene regulation and elucidate its relationship to human diseases.

Advisor
Mingyao Li
Rui Xiao
Date of degree
2020-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation