Now showing 1 - 7 of 7
PublicationCoRAL: Predicting Non-Coding RNAs from Small RNA-Sequencing Data(2013-08-01) Leung, Yuk Y; Ryvkin, Paul; Ungar, Lyle H; Gregory, Brian D; Wang, Li-SanThe surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms. PublicationSAVoR: A Server for Sequencing Annotation and Visualization of RNA Structures(2012-07-01) Li, Fan; Ryvkin, Paul; Childress, Daniel M; Valladares, Otto; Gregory, Brian D; Wang, Li-SanRNA secondary structure is required for the proper regulation of the cellular transcriptome. This is because the functionality, processing, localization and stability of RNAs are all dependent on the folding of these molecules into intricate structures through specific base pairing interactions encoded in their primary nucleotide sequences. Thus, as the number of RNA sequencing (RNA-seq) data sets and the variety of protocols for this technology grow rapidly, it is becoming increasingly pertinent to develop tools that can analyze and visualize this sequence data in the context of RNA secondary structure. Here, we present Sequencing Annotation and Visualization of RNA structures (SAVoR), a web server, which seamlessly links RNA structure predictions with sequencing data and genomic annotations to produce highly informative and annotated models of RNA secondary structure. SAVoR accepts read alignment data from RNA-seq experiments and computes a series of per-base values such as read abundance and sequence variant frequency. These values can then be visualized on a customizable secondary structure model. SAVoR is freely available at http://tesla.pcbi.upenn.edu/savor. PublicationGlobal Analysis of RNA Secondary Structure in Two Metazoans(2012-01-26) Li, Fan; Zheng, Qi; Ryvkin, Paul; Valladares, Otto; Murray, John I; Dragomir, Isabelle; Desai, Yaanik; Aiyer, Subhadra; Cherry, Sara; Wang, Li-San; Yang, Jamie; Gregory, Brian D; Bambina, Shelley; Sabin, Leah R; Lamitina, Todd; Rai, ArjunThe secondary structure of RNA is necessary for its maturation, regulation, processing, and function. However, the global influence of RNA folding in eukaryotes is still unclear. Here, we use a high-throughput, sequencing-based, structure-mapping approach to identify the paired (double-stranded RNA [dsRNA]) and unpaired (single-stranded RNA [ssRNA]) components of the Drosophila melanogaster and Caenorhabditis elegans transcriptomes, which allows us to identify conserved features of RNA secondary structure in metazoans. From this analysis, we find that ssRNAs and dsRNAs are significantly correlated with specific epigenetic modifications. Additionally, we find key structural patterns across protein-coding transcripts that indicate that RNA folding demarcates regions of protein translation and likely affects microRNA-mediated regulation of mRNAs in animals. Finally, we identify and characterize 546 mRNAs whose folding pattern is significantly correlated between these metazoans, suggesting that their structure has some function. Overall, our findings provide a global assessment of RNA folding in animals. PublicationTranscriptomic Changes Due to Cytoplasmic TDP-43 Expression Revel Dysregulation of Histone Transcripts and Nuclear Chromatin(2015-01-01) Amlie-Wolf, Alexandre; Ryvkin, Paul; Tong, Rui; Suh, EunRan; Xu, Yan; Van Deerlin, Vivianna M; Gregory, Brian D; Trojanowski, John Q; Lee, Virginia Man-Yee; Wang, Li-San; Lee, Edward B; Dragomir, Isabelle; Kwong, Linda KAR DNA-binding protein 43 (TDP-43) is normally a nuclear RNA-binding protein that exhibits a range of functions including regulation of alternative splicing, RNA trafficking, and RNA stability. However, in amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration with TDP-43 inclusions (FTLD-TDP), TDP-43 is abnormally phosphorylated, ubiquitinated, and cleaved, and is mislocalized to the cytoplasm where it forms distinctive aggregates. We previously developed a mouse model expressing human TDP-43 with a mutation in its nuclear localization signal (ΔNLS-hTDP-43) so that the protein preferentially localizes to the cytoplasm. These mice did not exhibit a significant number of cytoplasmic aggregates, but did display dramatic changes in gene expression as measured by microarray, suggesting that cytoplasmic TDP-43 may be associated with a toxic gain-of-function. Here, we analyze new RNA-sequencing data from the ΔNLS-hTDP-43 mouse model, together with published RNA-sequencing data obtained previously from TDP-43 antisense oligonucleotide (ASO) knockdown mice to investigate further the dysregulation of gene expression in the ΔNLS model. This analysis reveals that the transcriptomic effects of the overexpression of the ΔNLS-hTDP-43 transgene are likely due to a gain of cytoplasmic function. Moreover, cytoplasmic TDP-43 expression alters transcripts that regulate chromatin assembly, the nucleolus, lysosomal function, and histone 3’ untranslated region (UTR) processing. These transcriptomic alterations correlate with observed histologic abnormalities in heterochromatin structure and nuclear size in transgenic mouse and human brains. PublicationMethods in and Applications of the Sequencing of Short Non-Coding RNAs(2013-01-01) Ryvkin, PaulShort non-coding RNAs are important for all domains of life. With the advent of modern molecular biology their applicability to medicine has become apparent in settings ranging from diagonistic biomarkers to therapeutics and fields ranging from oncology to neurology. In addition, a critical, recent technological development is high-throughput sequencing of nucleic acids. The convergence of modern biotechnology with developments in RNA biology presents opportunities in both basic research and medical settings. Here I present two novel methods for leveraging high-throughput sequencing in the study of short non-coding RNAs, as well as a study in which they are applied to Alzheimer's Disease (AD). The computational methods presented here include High-throughput Annotation of Modified Ribonucleotides (HAMR), which enables researchers to detect post-transcriptional covalent modifications to RNAs in a high-throughput manner. In addition, I describe Classification of RNAs by Analysis of Length (CoRAL), a computational method that allows researchers to characterize the pathways responsible for short non-coding RNA biogenesis. Lastly, I present an application of the study of non-coding RNAs to Alzheimer's disease. When applied to the study of AD, it is apparent that several classes of non-coding RNAs, particularly tRNAs and tRNA fragments, show striking changes in the dorsolateral prefrontal cortex of affected human brains. Interestingly, the nature of these changes differs between mitochondrial and nuclear tRNAs, implicating an association between Alzheimer's disease and perturbation of mitochondrial function. In addition, by combining known genetic factors of AD with genes that are differentially expressed and targets of regulatory RNAs that are differentially expressed, I construct a network of genes that are potentially relevant to the pathogenesis of the disease. By combining genetics data with novel results from the study of non-coding RNAs, we can further elucidate the molecular mechanisms that underly Alzheimer's disease pathogenesis. PublicationGenome-Wide Double-Stranded RNA Sequencing Reveals the Functional Significance of Base-Paired RNAs in Arabidopsis(2010-09-30) Zheng, Qi; Ryvkin, Paul; Li, Fan; Valladares, Otto; Wang, Li-San; Gregory, Brian D; Dragomir, Isabelle; Yang, Jamie; Cao, KajiaThe functional structure of all biologically active molecules is dependent on intra- and inter-molecular interactions. This is especially evident for RNA molecules whose functionality, maturation, and regulation require formation of correct secondary structure through encoded base-pairing interactions. Unfortunately, intra- and inter-molecular base-pairing information is lacking for most RNAs. Here, we marry classical nuclease-based structure mapping techniques with high-throughput sequencing technology to interrogate all base-paired RNA in Arabidopsis thaliana and identify ∼200 new small (sm)RNA–producing substrates of RNA–DEPENDENT RNA POLYMERASE6. Our comprehensive analysis of paired RNAs reveals conserved functionality within introns and both 5′ and 3′ untranslated regions (UTRs) of mRNAs, as well as a novel population of functional RNAs, many of which are the precursors of smRNAs. Finally, we identify intra-molecular base-pairing interactions to produce a genome-wide collection of RNA secondary structure models. Although our methodology reveals the pairing status of RNA molecules in the absence of cellular proteins, previous studies have demonstrated that structural information obtained for RNAs in solution accurately reflects their structure in ribonucleoprotein complexes. Furthermore, our identification of RNA–DEPENDENT RNA POLYMERASE6 substrates and conserved functional RNA domains within introns and both 5′ and 3′ untranslated regions (UTRs) of mRNAs using this approach strongly suggests that RNA molecules are correctly folded into their secondary structure in solution. Overall, our findings highlight the importance of base-paired RNAs in eukaryotes and present an approach that should be widely applicable for the analysis of this key structural feature of RNA. PublicationHAMR: High-Throughput Annotation of Modified Ribonucleotides(2013-12-01) Ryvkin, Paul; Leung, Yuk Y; Childress, Micah; Valladares, Otto; Gregory, Brian D; Wang, Li-San; Silverman, Ian M; Dragomir, IsabelleRNA is often altered post-transcriptionally by the covalent modification of particular nucleotides; these modifications are known to modulate the structure and activity of their host RNAs. The recent discovery that an RNA methyl-6 adenosine demethylase (FTO) is a risk gene in obesity has brought to light the significance of RNA modifications to human biology. These noncanonical nucleotides, when converted to cDNA in the course of RNA sequencing, can produce sequence patterns that are distinguishable from simple base-calling errors. To determine whether these modifications can be detected in RNA sequencing data, we developed a method that can not only locate these modifications transcriptome-wide with single nucleotide resolution, but can also differentiate between different classes of modifications. Using small RNA-seq data we were able to detect 92% of all known human tRNA modification sites that are predicted to affect RT activity. We also found that different modifications produce distinct patterns of cDNA sequence, allowing us to differentiate between two classes of adenosine and two classes of guanine modifications with 98% and 79% accuracy, respectively. To show the robustness of this method to sample preparation and sequencing methods, as well as to organismal diversity, we applied it to a publicly available yeast data set and achieved similar levels of accuracy. We also experimentally validated two novel and one known 3-methylcytosine (3mC) sites predicted by HAMR in human tRNAs. Researchers can now use our method to identify and characterize RNA modifications using only RNA-seq data, both retrospectively and when asking questions specifically about modified RNA.