MACHINE LEARNING-OPTIMIZED TARGETED DETECTION OF ALTERNATIVE SPLICING

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Genomics and Computational Biology
Discipline
Bioinformatics
Biology
Subject
alternative splicing
genomics
machine learning
primer design
RNA
targeted sequencing
Funder
Grant number
License
Copyright date
2023
Distributor
Related resources
Author
Yang, Kevin
Contributor
Abstract

RNA-sequencing (RNA-seq) is widely used for analysis of alternative splicing. In practice however, the current gold standard of short-read RNA-Seq still has inherent biases which hinder its ability to detect and quantify splicing events in previously acquired large-scale datasets across thousands of samples. To address this, in the first major part of this thesis work, we present a targeted RNA-seq method that enriches for splicing-informative junction-spanning reads. Local Splicing Variation sequencing (LSV-seq) utilizes multiplexed reverse transcription from highly scalable pools of primers anchored near splice junctions of interest. Primers are designed using Optimal Prime, a novel dedicated machine learning algorithm we newly created in this work which was trained on the performance of thousands of primer sequences. LSV-seq achieves high on-target capture rates and concordance with RNA-seq, while requiring several-fold lower sequencing depth. We use LSV-seq to target events with low coverage in Genotype-Tissue Expression (GTEx) RNA-seq data and discover hundreds of previously hidden tissue-specific splicing events. Our results demonstrate the ability of LSV-seq to capture alternative splicing with exceptional sensitivity and highlight its potential to improve the detection of other RNA features of interest. In a related but distinct final part of this thesis work, we examined the role of alternative splicing regulation in specific biological contexts. Firstly, we identify splicing changes that occur in the context of a well-established chimeric antigen receptor (CAR) T-cell exhaustion model. We identify various individual splicing changes which likely have key biological functionality and can be investigated further in the future. Secondly, we reanalyze previously published datasets demonstrating the existence of non-genetically heritable expression changes. Our reanalysis convincingly demonstrate evidence that non- genetically heritable splicing changes exist within the same dataset, as our hypothesis proposed. In summary, in this work, we develop a new targeted method that will greatly improve the capabilities of alternative splicing sequencing experiments, and pinpoint specific biological areas where future work can uncover the role of alternative splicing variations.

Advisor
Choi, Peter
Barash, Yoseph
Date of degree
2023
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation