MACHINE LEARNING-OPTIMIZED TARGETED DETECTION OF ALTERNATIVE SPLICING
Degree type
Graduate group
Discipline
Biology
Subject
genomics
machine learning
primer design
RNA
targeted sequencing
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
RNA-sequencing (RNA-seq) is widely used for analysis of alternative splicing. In practice however, the current gold standard of short-read RNA-Seq still has inherent biases which hinder its ability to detect and quantify splicing events in previously acquired large-scale datasets across thousands of samples. To address this, in the first major part of this thesis work, we present a targeted RNA-seq method that enriches for splicing-informative junction-spanning reads. Local Splicing Variation sequencing (LSV-seq) utilizes multiplexed reverse transcription from highly scalable pools of primers anchored near splice junctions of interest. Primers are designed using Optimal Prime, a novel dedicated machine learning algorithm we newly created in this work which was trained on the performance of thousands of primer sequences. LSV-seq achieves high on-target capture rates and concordance with RNA-seq, while requiring several-fold lower sequencing depth. We use LSV-seq to target events with low coverage in Genotype-Tissue Expression (GTEx) RNA-seq data and discover hundreds of previously hidden tissue-specific splicing events. Our results demonstrate the ability of LSV-seq to capture alternative splicing with exceptional sensitivity and highlight its potential to improve the detection of other RNA features of interest. In a related but distinct final part of this thesis work, we examined the role of alternative splicing regulation in specific biological contexts. Firstly, we identify splicing changes that occur in the context of a well-established chimeric antigen receptor (CAR) T-cell exhaustion model. We identify various individual splicing changes which likely have key biological functionality and can be investigated further in the future. Secondly, we reanalyze previously published datasets demonstrating the existence of non-genetically heritable expression changes. Our reanalysis convincingly demonstrate evidence that non- genetically heritable splicing changes exist within the same dataset, as our hypothesis proposed. In summary, in this work, we develop a new targeted method that will greatly improve the capabilities of alternative splicing sequencing experiments, and pinpoint specific biological areas where future work can uncover the role of alternative splicing variations.
Advisor
Barash, Yoseph