Wang, Li-San

Email Address
Research Projects
Organizational Units
Research Interests

Search Results

Now showing 1 - 10 of 10
  • Publication
    CoRAL: Predicting Non-Coding RNAs from Small RNA-Sequencing Data
    (2013-08-01) Leung, Yuk Y; Ryvkin, Paul; Ungar, Lyle H; Gregory, Brian D; Wang, Li-San; Leung, Yuk Y; Ryvkin, Paul; Ungar, Lyle H; Gregory, Brian D; Wang, Li-San
    The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms.
  • Publication
    Global Analysis of RNA Secondary Structure in Two Metazoans
    (2012-01-26) Li, Fan; Zheng, Qi; Ryvkin, Paul; Valladares, Otto; Li, Fan; Zheng, Qi; Murray, John I; Dragomir, Isabelle; Desai, Yaanik; Cherry, Sara; Wang, Li-San; Gregory, Brian D; Bambina, Shelley; Sabin, Leah R; Murray, John I; Lamitina, Todd; Rai, Arjun; Cherry, Sara; Wang, Li-San; Gregory, Brian D
    The secondary structure of RNA is necessary for its maturation, regulation, processing, and function. However, the global influence of RNA folding in eukaryotes is still unclear. Here, we use a high-throughput, sequencing-based, structure-mapping approach to identify the paired (double-stranded RNA [dsRNA]) and unpaired (single-stranded RNA [ssRNA]) components of the Drosophila melanogaster and Caenorhabditis elegans transcriptomes, which allows us to identify conserved features of RNA secondary structure in metazoans. From this analysis, we find that ssRNAs and dsRNAs are significantly correlated with specific epigenetic modifications. Additionally, we find key structural patterns across protein-coding transcripts that indicate that RNA folding demarcates regions of protein translation and likely affects microRNA-mediated regulation of mRNAs in animals. Finally, we identify and characterize 546 mRNAs whose folding pattern is significantly correlated between these metazoans, suggesting that their structure has some function. Overall, our findings provide a global assessment of RNA folding in animals.
  • Publication
    High-Throughput Identification of Long-Range Regulatory Elements and Their Target Promoters in the Human Genome
    (2013-05-01) Hwang, Yih-Chii; Zheng, Qi; Gregory, Brian D; Wang, Li-San; Hwang, Yih-Chii; Zheng, Qi; Gregory, Brian D; Wang, Li-San
    Enhancer elements are essential for tissue-specific gene regulation during mammalian development. Although these regulatory elements are often distant from their target genes, they affect gene expression by recruiting transcription factors to specific promoter regions. Because of this long-range action, the annotation of enhancer element–target promoter pairs remains elusive. Here, we developed a novel analysis methodology that takes advantage of Hi-C data to comprehensively identify these interactions throughout the human genome. To do this, we used a geometric distribution-based model to identify DNA–DNA interaction hotspots that contact gene promoters with high confidence. We observed that these promoter-interacting hotspots significantly overlap with known enhancer-associated histone modifications and DNase I hypersensitive sites. Thus, we defined thousands of candidate enhancer elements by incorporating these features, and found that they have a significant propensity to be bound by p300, an enhancer binding transcription factor. Furthermore, we revealed that their target genes are significantly bound by RNA Polymerase II and demonstrate tissue-specific expression. Finally, we uncovered that these elements are generally found within 1 Mb of their targets, and often regulate multiple genes. In total, our study presents a novel high-throughput workflow for confident, genome-wide discovery of enhancer–target promoter pairs, which will significantly improve our understanding of these regulatory interactions.
  • Publication
    Chemical Modifications Mark Alternatively Spliced and Uncapped Messenger RNAs in Arabidposis
    (2015-11-01) Vandivier, Lee E; Kuksa, Pavel P; Wang, Li-San; Gregory, Brian D; Kuksa, Pavel P; Silverman, Ian M; Wang, Li-San; Gregory, Brian D
    Posttranscriptional chemical modification of RNA bases is a widespread and physiologically relevant regulator of RNA maturation, stability, and function. While modifications are best characterized in short, noncoding RNAs such as tRNAs, growing evidence indicates that mRNAs and long noncoding RNAs (lncRNAs) are likewise modified. Here, we apply our high-throughput annotation of modified ribonucleotides (HAMR) pipeline to identify and classify modifications that affect Watson-Crick base pairing at three different levels of the Arabidopsis thaliana transcriptome (polyadenylated, small, and degrading RNAs). We find this type of modifications primarily within uncapped, degrading mRNAs and lncRNAs, suggesting they are the cause or consequence of RNA turnover. Additionally, modifications within stable mRNAs tend to occur in alternatively spliced introns, suggesting they regulate splicing. Furthermore, these modifications target mRNAs with coherent functions, including stress responses. Thus, our comprehensive analysis across multiple RNA classes yields insights into the functions of covalent RNA modifications in plant transcriptomes.
  • Publication
    Transcriptomic Changes Due to Cytoplasmic TDP-43 Expression Revel Dysregulation of Histone Transcripts and Nuclear Chromatin
    (2015-01-01) Amlie-Wolf, Alexandre; Ryvkin, Paul; Tong, Rui; Suh, EunRan; Xu, Yan; Van Deerlin, Vivianna M; Gregory, Brian D; Trojanowski, John Q; Lee, Virginia Man-Yee; Wang, Li-San; Lee, Edward B; Tong, Rui; Dragomir, Isabelle; Suh, EunRan; Xu, Yan; Van Deerlin, Vivianna M; Gregory, Brian D; Kwong, Linda K; Trojanowski, John Q; Lee, Virginia Man-Yee; Wang, Li-San; Lee, Edward B
    AR DNA-binding protein 43 (TDP-43) is normally a nuclear RNA-binding protein that exhibits a range of functions including regulation of alternative splicing, RNA trafficking, and RNA stability. However, in amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration with TDP-43 inclusions (FTLD-TDP), TDP-43 is abnormally phosphorylated, ubiquitinated, and cleaved, and is mislocalized to the cytoplasm where it forms distinctive aggregates. We previously developed a mouse model expressing human TDP-43 with a mutation in its nuclear localization signal (ΔNLS-hTDP-43) so that the protein preferentially localizes to the cytoplasm. These mice did not exhibit a significant number of cytoplasmic aggregates, but did display dramatic changes in gene expression as measured by microarray, suggesting that cytoplasmic TDP-43 may be associated with a toxic gain-of-function. Here, we analyze new RNA-sequencing data from the ΔNLS-hTDP-43 mouse model, together with published RNA-sequencing data obtained previously from TDP-43 antisense oligonucleotide (ASO) knockdown mice to investigate further the dysregulation of gene expression in the ΔNLS model. This analysis reveals that the transcriptomic effects of the overexpression of the ΔNLS-hTDP-43 transgene are likely due to a gain of cytoplasmic function. Moreover, cytoplasmic TDP-43 expression alters transcripts that regulate chromatin assembly, the nucleolus, lysosomal function, and histone 3’ untranslated region (UTR) processing. These transcriptomic alterations correlate with observed histologic abnormalities in heterochromatin structure and nuclear size in transgenic mouse and human brains.
  • Publication
    HAMR: High-Throughput Annotation of Modified Ribonucleotides
    (2013-12-01) Ryvkin, Paul; Leung, Yuk Y; Childress, Micah; Valladares, Otto; Gregory, Brian D; Wang, Li-San; Silverman, Ian M; Childress, Micah; Valladares, Otto; Dragomir, Isabelle; Gregory, Brian D; Wang, Li-San
    RNA is often altered post-transcriptionally by the covalent modification of particular nucleotides; these modifications are known to modulate the structure and activity of their host RNAs. The recent discovery that an RNA methyl-6 adenosine demethylase (FTO) is a risk gene in obesity has brought to light the significance of RNA modifications to human biology. These noncanonical nucleotides, when converted to cDNA in the course of RNA sequencing, can produce sequence patterns that are distinguishable from simple base-calling errors. To determine whether these modifications can be detected in RNA sequencing data, we developed a method that can not only locate these modifications transcriptome-wide with single nucleotide resolution, but can also differentiate between different classes of modifications. Using small RNA-seq data we were able to detect 92% of all known human tRNA modification sites that are predicted to affect RT activity. We also found that different modifications produce distinct patterns of cDNA sequence, allowing us to differentiate between two classes of adenosine and two classes of guanine modifications with 98% and 79% accuracy, respectively. To show the robustness of this method to sample preparation and sequencing methods, as well as to organismal diversity, we applied it to a publicly available yeast data set and achieved similar levels of accuracy. We also experimentally validated two novel and one known 3-methylcytosine (3mC) sites predicted by HAMR in human tRNAs. Researchers can now use our method to identify and characterize RNA modifications using only RNA-seq data, both retrospectively and when asking questions specifically about modified RNA.
  • Publication
    Genome-Wide Double-Stranded RNA Sequencing Reveals the Functional Significance of Base-Paired RNAs in Arabidopsis
    (2010-09-30) Zheng, Qi; Ryvkin, Paul; Li, Fan; Valladares, Otto; Zheng, Qi; Ryvkin, Paul; Wang, Li-San; Li, Fan; Gregory, Brian D; Dragomir, Isabelle; Valladares, Otto; Yang, Jamie; Cao, Kajia; Wang, Li-San; Gregory, Brian D
    The functional structure of all biologically active molecules is dependent on intra- and inter-molecular interactions. This is especially evident for RNA molecules whose functionality, maturation, and regulation require formation of correct secondary structure through encoded base-pairing interactions. Unfortunately, intra- and inter-molecular base-pairing information is lacking for most RNAs. Here, we marry classical nuclease-based structure mapping techniques with high-throughput sequencing technology to interrogate all base-paired RNA in Arabidopsis thaliana and identify ∼200 new small (sm)RNA–producing substrates of RNA–DEPENDENT RNA POLYMERASE6. Our comprehensive analysis of paired RNAs reveals conserved functionality within introns and both 5′ and 3′ untranslated regions (UTRs) of mRNAs, as well as a novel population of functional RNAs, many of which are the precursors of smRNAs. Finally, we identify intra-molecular base-pairing interactions to produce a genome-wide collection of RNA secondary structure models. Although our methodology reveals the pairing status of RNA molecules in the absence of cellular proteins, previous studies have demonstrated that structural information obtained for RNAs in solution accurately reflects their structure in ribonucleoprotein complexes. Furthermore, our identification of RNA–DEPENDENT RNA POLYMERASE6 substrates and conserved functional RNA domains within introns and both 5′ and 3′ untranslated regions (UTRs) of mRNAs using this approach strongly suggests that RNA molecules are correctly folded into their secondary structure in solution. Overall, our findings highlight the importance of base-paired RNAs in eukaryotes and present an approach that should be widely applicable for the analysis of this key structural feature of RNA.
  • Publication
    SAVoR: A Server for Sequencing Annotation and Visualization of RNA Structures
    (2012-07-01) Li, Fan; Ryvkin, Paul; Childress, Daniel M; Valladares, Otto; Gregory, Brian D; Wang, Li-San; Li, Fan; Ryvkin, Paul; Childress, Daniel M; Valladares, Otto; Gregory, Brian D; Wang, Li-San
    RNA secondary structure is required for the proper regulation of the cellular transcriptome. This is because the functionality, processing, localization and stability of RNAs are all dependent on the folding of these molecules into intricate structures through specific base pairing interactions encoded in their primary nucleotide sequences. Thus, as the number of RNA sequencing (RNA-seq) data sets and the variety of protocols for this technology grow rapidly, it is becoming increasingly pertinent to develop tools that can analyze and visualize this sequence data in the context of RNA secondary structure. Here, we present Sequencing Annotation and Visualization of RNA structures (SAVoR), a web server, which seamlessly links RNA structure predictions with sequencing data and genomic annotations to produce highly informative and annotated models of RNA secondary structure. SAVoR accepts read alignment data from RNA-seq experiments and computes a series of per-base values such as read abundance and sequence variant frequency. These values can then be visualized on a customizable secondary structure model. SAVoR is freely available at
  • Publication
    A Comprehensive Database of High-Throughput Sequencing-Based RNA Secondary Structure Probing Data (Structure Surfer)
    (2016-05-01) Childress, Daniel M; Berkowitz, Nathan D; Wang, Li-San; Gregory, Brian D; Kazan, Hilal; Wang, Li-San; Gregory, Brian D
    Background RNA molecules fold into complex three-dimensional shapes, guided by the pattern of hydrogen bonding between nucleotides. This pattern of base pairing, known as RNA secondary structure, is critical to their cellular function. Recently several diverse methods have been developed to assay RNA secondary structure on a transcriptome-wide scale using high-throughput sequencing. Each approach has its own strengths and caveats, however there is no widely available tool for visualizing and comparing the results from these varied methods. Methods To address this, we have developed Structure Surfer, a database and visualization tool for inspecting RNA secondary structure in six transcriptome-wide data sets from human and mouse ( The data sets were generated using four different high-throughput sequencing based methods. Each one was analyzed with a scoring pipeline specific to its experimental design. Users of Structure Surfer have the ability to query individual loci as well as detect trends across multiple sites. Results Here, we describe the included data sets and their differences. We illustrate the database’s function by examining known structural elements and we explore example use cases in which combined data is used to detect structural trends. Conclusions In total, Structure Surfer provides an easy-to-use database and visualization interface for allowing users to interrogate the currently available transcriptome-wide RNA secondary structure information for mammals.
  • Publication
    DASHR: Database of Small Human Noncoding RNAs
    (2016-01-01) Leung, Yuk Y; Kuksa, Pavel P; Amlie-Wolf, Alexandre; Valladares, Otto; Ungar, Lyle H; Kannan, Sampath; Gregory, Brian D; Wang, Li-San; Leung, Yuk Y; Kuksa, Pavel P; Amlie-Wolf, Alexandre; Valladares, Otto; Ungar, Lyle H; Kannan, Sampath; Gregory, Brian D; Wang, Li-San
    Small non-coding RNAs (sncRNAs) are highly abundant RNAs, typically long, that act as key regulators of diverse cellular processes. Although thousands of sncRNA genes are known to exist in the human genome, no single database provides searchable, unified annotation, and expression information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. Here, we present the Database of small human noncoding RNAs (DASHR) . DASHR contains the most comprehensive information to date on human sncRNA genes and mature sncRNA products. DASHR provides a simple user interface for researchers to view sequence and secondary structure, compare expression levels, and evidence of specific processing across all sncRNA genes and mature sncRNA products in various human tissues. DASHR annotation and expression data covers all major classes of sncRNAs including microRNAs (miRNAs), Piwi-interacting (piRNAs), small nuclear, nucleolar, cytoplasmic (sn-, sno-, scRNAs, respectively), transfer (tRNAs), and ribosomal RNAs (rRNAs). Currently, DASHR (v1.0) integrates 187 smRNA high-throughput sequencing (smRNA-seq) datasets with over 2.5 billion reads and annotation data from multiple public sources. DASHR contains annotations for ~48,000 human sncRNA genes and mature sncRNA products, 82% of which are expressed in one of more of the curated tissues. DASHR is available at