Date of Award
2018
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Graduate Group
Epidemiology & Biostatistics
First Advisor
Mingyao Li
Abstract
The emergence of RNA-seq technology has made it possible to estimate isoform-specific gene
expression and detect differential alternative splicing between conditions, thus providing us an effective
way to discover disease susceptibility genes. Analysis of alternative splicing, however, is
challenging because various biases present in RNA-seq data complicates the analysis, and if not
appropriately corrected, will affect gene expression estimation and downstream modeling. Motivated
by these issues, my dissertation focused on statistical problems related to the analysis of
alternative splicing in RNA-seq data. In Part I of my dissertation, I developed PennSeq, a method
that aims to account for non-uniform read distribution in isoform expression estimation. PennSeq
models non-uniformity using the empirical read distribution in RNA-seq data. It is the first time that
non-uniformity is modeled at the isoform level. Compared to existing approaches, PennSeq allows
bias correction at a much finer scale and achieved higher estimation accuracy. In Part II of my
dissertation, I developed PennDiff, a method that aims to detect differential alternative splicing by
RNA-seq. This approach avoids multiple testing for exons originated from the same isoform(s) and
is able to detect differential alternative splicing at both exon and gene level, with more flexibility
and higher sensitivity than existing methods. In Part III of my dissertation, I focused on problems
arising from single-cell RNA-seq (scRNA-seq), a newly developed technology that allows the measurement
of cellular heterogeneity of gene expression in single cells. Compared to bulk tissue
RNA-seq, analysis of scRNA-seq data is more challenging due to high technical variability across
cells and extremely low sequencing depth. To overcome these challenges, I developed SCATS, a
method that aims to detect differential alternative splicing with scRNA-seq data. SCATS employs
an empirical Bayes approach to model technical noise by use of external RNA spike-ins and groups
informative reads sharing the same isoform(s) to detect splicing change. SCATS showed superior
performance in both simulation and real data analyses. In summary, methods developed in my
dissertation provide biomedical researchers a set of powerful tools for transcriptomic data analysis
and will aid novel scientific discovery.
Recommended Citation
Hu, Yu, "Statistical Methods For Alternative Splicing Using Rna Sequencing" (2018). Publicly Accessible Penn Dissertations. 3016.
https://repository.upenn.edu/edissertations/3016