RNA and DNA Sequence Analysis of the Human Transcriptome

Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Genomics & Computational Biology
Computational biology
Genome analysis
Next-generation sequencing
RNA Editing
Transcriptome analysis
Grant number
Copyright date
Related resources

The manifestation of phenotype at the cellular and organismal level is determined in large part by gene expression, or the transcription of DNA into RNA. As such, the study of the transcriptome, or the characterization and quantification of all RNA produced in the cell, is important. Recent advances in sequencing technology have allowed for unprecedented interrogation of the transcriptome at single-nucleotide resolution. In the first part of this thesis, we use RNA-Sequencing (RNA-Seq) to study the human B-cell transcriptome and determine the experimental parameters necessary for sequencing-based studies of gene expression. We discover that deep sequencing is necessary to detect fully and quantify accurately the complexity of human transcriptomes. Furthermore, we find that at high sequencing depths, the vast majority of transcribed elements in human B-cells are detected. In the second part of this thesis, we utilize the sequence information provided by RNA-Seq to analyze systematic differences between DNA and RNA sequence. The transmission of information from DNA to RNA is a critical process and is expected to occur in a one-to-one fashion. By comparing the DNA sequence to RNA sequence of the same individuals, we found all 12 types of RNA-DNA sequence differences (RDDs), the majority of which cannot be explained by known mechanisms such as RNA editing or transcriptional infidelity. We developed computational methods to robustly identify RDDs and control for false positives resulting from genotyping, sequencing, and alignment error. Finally, we explore the genetic basis of RDD levels, or the proportion of reads at a site bearing the sequence difference allele. In particular, we analyzed the levels of RNA editing in unrelated and related individuals and found that a significant portion of individual variation in A-to-G editing levels contains a genetic component. In summary, our results demonstrate that RNA-Seq is a powerful technique for comprehensive and quantitative analysis of gene expression. In addition, the resolution offered by RNA-Seq enables a detailed view of sequence differences between RNA and DNA. Future work will focus on understanding the mechanisms and factors influencing RDDs. Our results suggest that RDD levels may be considered a quantitative and heritable phenotype; as such, a genetic approach may be a sensible method for finding the determinants and mechanism of RDDs.

Frederic Bushman
Nancy Zhang
Date of degree
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher DOI
Journal Issue
Recommended citation