Statistical Methods For Alternative Splicing Using Rna Sequencing

Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
Alternative Splicing
Gene expression
Isoform expression
RNA Sequencing
Transcriptomic Variation
Biostatistics
Funder
Grant number
License
Copyright date
2018-09-28T20:18:00-07:00
Distributor
Related resources
Author
Contributor
Abstract

The emergence of RNA-seq technology has made it possible to estimate isoform-specific gene expression and detect differential alternative splicing between conditions, thus providing us an effective way to discover disease susceptibility genes. Analysis of alternative splicing, however, is challenging because various biases present in RNA-seq data complicates the analysis, and if not appropriately corrected, will affect gene expression estimation and downstream modeling. Motivated by these issues, my dissertation focused on statistical problems related to the analysis of alternative splicing in RNA-seq data. In Part I of my dissertation, I developed PennSeq, a method that aims to account for non-uniform read distribution in isoform expression estimation. PennSeq models non-uniformity using the empirical read distribution in RNA-seq data. It is the first time that non-uniformity is modeled at the isoform level. Compared to existing approaches, PennSeq allows bias correction at a much finer scale and achieved higher estimation accuracy. In Part II of my dissertation, I developed PennDiff, a method that aims to detect differential alternative splicing by RNA-seq. This approach avoids multiple testing for exons originated from the same isoform(s) and is able to detect differential alternative splicing at both exon and gene level, with more flexibility and higher sensitivity than existing methods. In Part III of my dissertation, I focused on problems arising from single-cell RNA-seq (scRNA-seq), a newly developed technology that allows the measurement of cellular heterogeneity of gene expression in single cells. Compared to bulk tissue RNA-seq, analysis of scRNA-seq data is more challenging due to high technical variability across cells and extremely low sequencing depth. To overcome these challenges, I developed SCATS, a method that aims to detect differential alternative splicing with scRNA-seq data. SCATS employs an empirical Bayes approach to model technical noise by use of external RNA spike-ins and groups informative reads sharing the same isoform(s) to detect splicing change. SCATS showed superior performance in both simulation and real data analyses. In summary, methods developed in my dissertation provide biomedical researchers a set of powerful tools for transcriptomic data analysis and will aid novel scientific discovery.

Advisor
Mingyao Li
Date of degree
2018-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation