Characterizing Local Splicing Variations From Heterogeneous Rna-Sequencing Datasets

Caleb Matthew Radens, University of Pennsylvania

Abstract

Alternative splicing is a ubiquitous gene regulatory mechanism with important roles during normal development and, when dysregulated, in disease. The study of alternative splicing has been greatly facilitated by RNA-Seq. Most RNA-Seq-based splicing quantification methodologies only consider at most two mRNA isoforms at a time, but one-third of all splicing variations are complex (involve three or more isoforms). In this dissertation, I assemble pipelines and create tools for dissecting biologically relevant signals from heterogeneous RNA-Seq datasets, and then apply these tools to specific biological systems to elucidate the rule of splicing in those systems. First, I develop a data processing pipeline optimized for studying splicing from RNA-Seq and use it to promote the discovery that splicing of Esrp1 target genes is required for inner ear development and hearing. I go on to discover genes whose expression and splicing are Gsk3-dependent, which implicates Gsk3-based phosphorylation activity as a regulator of splicing in mouse embryonic stem cells. Next, I lead development of an algorithm to classify simple splicing events from complex splicing variations. I then apply this algorithm to study how splicing patterns vary across 13 brain subregions, providing the first analysis of complex splicing variations comprising non-annotated junctions and introns from the human brain. I also use the knowledge, pipelines, and tools I develop in this dissertation to discover novel transcriptomic markers of CD4+ T cells using 11 distinct RNA-Seq datasets. Finally, I quantitatively describe for the first time how batch effects impact splicing analysis, and I then develop, test, and apply a new tool to remove batch effects from RNA-Seq of 579 pediatric B cell acute lymphoblastic leukemia patients and 238 shRNA knockdown experiments. Together, the work in this dissertation provides tools and approaches that facilitate researchers’ ability to study splicing using RNA-Seq.