Integrative analysis of transcriptomic data to elucidate regulators of pre-mRNA processing
Degree type
Graduate group
Discipline
Subject
Alternative splicing
DDX55
RNA binding proteins
RNA-seq
Transcriptomics
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Alternative pre-mRNA processing events, including alternative splicing (AS) and alternative polyadenylation (APA), are key drivers of transcriptomic and proteomic diversity. This dissertation presents a comprehensive framework for analyzing these events, emphasizing improvements to computational tools and workflows that enhance the accuracy, accessibility, and interpretability of transcriptomic data across diverse high-throughput sequencing modalities, including RNA-seq, targeted 3' end sequencing, and CLIP-seq. Chapters 2 and 3 focus on the development and application of MAJIQ v2, a co-first author contribution that advances splicing analysis by enabling more accurate identification and quantification of both simple and complex AS events. A key innovation introduced is the MAJIQ v2 Modulizer, which facilitates regulatory analysis by decomposing complex splicing patterns into discrete, interpretable modules made up of binary AS event building blocks. This modular representation, combined with MAJIQ’s superior accuracy and ability to handle intron retention enables more robust splicing quantification across tissues (GTEx) and improves the discovery of splicing QTLs (sQTLs), as demonstrated by validated MAJIQ-sQTLs in \textit{CYP11B1} (Chapter 3). Chapter 4 presents benchmarking work from APAeval, highlighting methodological advances for the identification and quantification of APA from RNA-seq data. In Chapter 5, ENCODE RBP knockdown data are uniformly processed with DaPars to systematically profile the impact of RNA-binding proteins on 3'UTR isoform diversity. This integrative analysis uncovers several novel regulators of APA, most notably the RNA helicase DDX55, and provides the community with a standardized analysis resource for investigating APA regulation by RBPs. Finally, Chapter 6 introduces the Comparative Analysis of Alternative RNA Processing (CAARP), a lightweight framework and repository of reproducible analysis scripts in the form of iPython notebooks that support flexible and extensible exploration of regulatory features and RBPs that may play a role in user-defined sets of AS and APA events from RNA-seq data. CAARP has already been applied in multiple published studies and offers a scalable foundation for future regulatory analyses of pre-mRNA processing across varied transcriptomic datasets.
Advisor
Lynch, Kristen, W