Transcript Diversity In The Protozoan Parasite Toxoplasma Gondii
Eukaryotic genome annotation
Technological advances have made possible to sequence RNA transcripts at unprecedented depth, enabling deep profiling of abundance and diversity under a variety of conditions. Such information permits refinement of draft genome annotation originally generated in the absence of transcript coverage data, and provides new insights into organismal biology and regulatory mechanisms. This dissertation provides an extensive analysis of mRNA-seq data from the obligate intracellular protozoan parasite Toxoplasma gondii, a ubiquitous pathogen of humans and other vertebrates. We produced and sequenced 24 strand-specific RNA libraries from several parasite strains and developmental stages, and examined these in con�junction with 45 additional mRNA-seq libraries produced by other groups. The current reference genome annotation for T. gondii, generated using de novo methods informed by cDNA sequencing prior to mRNA-seq, identifies ~8300 protein-coding genes, fragmented by ~40K introns. Untranslated regions are incompletely defined, few alternatively-spliced transcripts are described, and non-coding transcripts remain largely unexplored. mRNA-seq datasets presented in this dissertation define a total of 2.7M introns, most observed at vanishingly low abundance. Using current annotation to define parameters minimizing false discovery yields ~60K likely splice junctions. Comparing the frequency of intron-spanning reads to the abundance of transcripts to which introns belong provides a reliable metric for estimating intron excision, readily distinguishing introns that are (i) universally used, (ii) alternatively-spliced, or (iii) likely insignificant. Genome-wide analysis suggests ~3000 annotated introns that should be deleted from the reference genome, ~1400 to be added as alternative isoforms, ~3100 as additions to existing annotation (often within UTRs) and ~3400 associated with novel transcripts. Transcriptomic expression is consis�tent with biological and phenotypic variation across the complex parasite life cycle, including undescribed differences in gene expression during intracellular tachyzoite replication. Strong circumstantial evidence also suggests that lncRNAs may play an important role in regulating stage-specific expression during sexual differentia�tion and sporogony. These results provide the basis for revising the reference T. gondii genome annotation available at ToxoDB.org and GenBank. Strategies developed in this dissertation also provide the basis for defining annotation criteria for other species, including related parasites responsible for malaria and conceivably other eukaryotes as well.