Telomere and Proximal Sequence Analysis Using High-Throughput Sequencing Reads
Degree type
Graduate group
Discipline
Subject
ChIP-Seq
High Throughput
Next Generation
Sequencing
Telomere Biology
Bioinformatics
Genetics
Molecular Biology
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
The telomere is a specialized simple sequence repeat found at the end of all linear chromosomes. It acts as a substrate for telomere binding factors that in coordination with other interacting elements form what is known as the shelterin complex to protect the end of the chromosome from the DNA damage repair machinery. The telomere shortens with each cell division, and once critically short is no longer able to perform this role. Short dysfunctional telomeres result in cellular senescence, apoptosis, or genome instability. Telomere length is regulated by many factors including cis-acting elements in the proximal sequence which is known as the subtelomere. The Riethman lab played a pivotal role in generating the reference sequence of the subtelomere in both the human and mouse genomes, providing an essential resource for this work. Short high throughput sequencing (HTS) reads generated from the simple repeat containing telomere or the segmental duplication rich subtelomere cannot be aligned to a reference genome uniquely. They are filtered and excluded from many HTS analysis methods. A ChIP-Seq analysis pipeline was developed to incorporate these multimapping reads to study DNA-protein interactions in the subtelomere. This pipeline was employed to search for factors regulating the expression TERRA, an essential long non-coding RNA, and to better characterize their transcription start sites. ChIP-seq analysis in the human subtelomere found colocalization of CTCF and Cohesin directly adjacent to the telomere and throughout the subtelomere specific repeats. Follow up functional studies showed this binding regulated TERRA transcription at these sites. Extending these analyses in the mouse genome showed very different patterns of CTCF and cohesin binding, with no evidence of binding at apparent sites of TERRA transcription. Mouse subtelomere sequence analysis showed the co-occurence of two repeats at sites of putative TERRA expression, MurSatRep1 and MMSAT4, one of which was previously shown to be expressed in lincRNAs. The Telomere Analysis from SEquencing Reads(TASER) pipeline was developed to capture telomere information from HTS data sets and used to investigate telomere changes that occur in prostate cancer. TASER analysis of 53 paired prostate tumor and normal samples revealed an overall decrease in telomere length in tumor samples relative to matched paired normal tissue, especially sequence containing the exact canonical telomere repeat. Multimapping reads contain important information, that when used properly, help elucidate understanding of telomere biology, cancer biology, and genome regulation and stability.