Identification of Long-Range Regulatory Elements in the Human Genome

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Genomics & Computational Biology
Discipline
Subject
Chromatin
Chromosomal structure/function
Genomics
Genomic structure
Regulation of transcription
Bioinformatics
Funder
Grant number
License
Copyright date
2016-11-29T00:00:00-08:00
Distributor
Related resources
Contributor
Abstract

Genome-wide association studies have shown that the majority of disease-associated genetic variants lie within non-coding regions of the human genome. Subsequently, a challenge following these discoveries is to identify how these variants modulate the risk of disease. Enhancers are non-coding regulatory elements that can be bound by proteins to activate the expression of a gene that may be linearly distant. Experimentally probing all possible enhancer–target gene pairs can be laborious. Hi-C, a technique developed by Job Dekker’s group in 2009, combines high-throughput sequencing with chromosome conformation capture to detect DNA interactions genome-wide and thereby reveals the three-dimensional architecture of chromatin in the nucleus. However, the utility of the datasets produced by this technique for discovering long-range regulatory interactions is largely unexplored. In this thesis, we develop novel approaches to identify DNA-interacting units and their interactions in Hi-C datasets with the goal of uncovering all enhancer–target gene interactions. We began by identifying significantly interacting regions in these datasets, subsequently focusing on candidate enhancer–gene pairs. We found that the identified putative enhancers are enriched for p300 binding activity, while their target promoters are likely to be cell-type-specific. Furthermore, we revealed that enhancers and target genes often interact in many-to-many relationships and the majority of enhancer–target gene interactions are intra-chromosomal and within 1 Mb of each other. Next, we refined our analytical approach to identify physically-interacting DNA regions at ~1 kb resolution and better define the boundaries of likely enhancer elements. By searching for over-represented sequences (motifs) in these putative promoter-interacting enhancers, we were then able to identify bound transcription factors. This newer approach provides the potential to identify protein complexes involved in enhancer–promoter interactions, which can be verified in future experiments. We implemented a high-throughput identification pipeline for promoter-interacting enhancer elements (HIPPIE) using both of the above described approaches. HIPPIE can be run efficiently on typical Linux servers and grid computing environments and is available as open-source software. In summary, our findings demonstrate the potential utility of Hi-C technologies for elucidating the mechanisms by which long-range enhancers regulate gene expression and ultimately result in human disease phenotypes.

Advisor
Li-San Wang
Brian D. Gregory
Date of degree
2015-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation