Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Epidemiology & Biostatistics

First Advisor

Mingyao Li


The emergence of RNA-seq technology has made it possible to estimate isoform-specific gene

expression and detect differential alternative splicing between conditions, thus providing us an effective

way to discover disease susceptibility genes. Analysis of alternative splicing, however, is

challenging because various biases present in RNA-seq data complicates the analysis, and if not

appropriately corrected, will affect gene expression estimation and downstream modeling. Motivated

by these issues, my dissertation focused on statistical problems related to the analysis of

alternative splicing in RNA-seq data. In Part I of my dissertation, I developed PennSeq, a method

that aims to account for non-uniform read distribution in isoform expression estimation. PennSeq

models non-uniformity using the empirical read distribution in RNA-seq data. It is the first time that

non-uniformity is modeled at the isoform level. Compared to existing approaches, PennSeq allows

bias correction at a much finer scale and achieved higher estimation accuracy. In Part II of my

dissertation, I developed PennDiff, a method that aims to detect differential alternative splicing by

RNA-seq. This approach avoids multiple testing for exons originated from the same isoform(s) and

is able to detect differential alternative splicing at both exon and gene level, with more flexibility

and higher sensitivity than existing methods. In Part III of my dissertation, I focused on problems

arising from single-cell RNA-seq (scRNA-seq), a newly developed technology that allows the measurement

of cellular heterogeneity of gene expression in single cells. Compared to bulk tissue

RNA-seq, analysis of scRNA-seq data is more challenging due to high technical variability across

cells and extremely low sequencing depth. To overcome these challenges, I developed SCATS, a

method that aims to detect differential alternative splicing with scRNA-seq data. SCATS employs

an empirical Bayes approach to model technical noise by use of external RNA spike-ins and groups

informative reads sharing the same isoform(s) to detect splicing change. SCATS showed superior

performance in both simulation and real data analyses. In summary, methods developed in my

dissertation provide biomedical researchers a set of powerful tools for transcriptomic data analysis

and will aid novel scientific discovery.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Biostatistics Commons