Statistical Methods for Analysis of Multi-Sample Copy Number Variants and ChIP-seq Data

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
ChIP-seq
CNV
Histone modification
Kernel-smoothing
Multi-sample
Nonparametric test
Biostatistics
Funder
Grant number
License
Copyright date
2014-08-22T20:13:00-07:00
Distributor
Related resources
Author
Wu, Qian
Contributor
Abstract

This dissertation addresses the statistical problems related to multiple-sample copy number variants (CNVs) analysis and analysis of differential enrichment of histone modifications (HMs) between two or more biological conditions based on the Chromatin Immunoprecipitation and sequencing (ChIP-seq) data. The first part of the dissertation develops methods for identifying the copy number variants that are associated with trait values. We develop a novel method, CNVtest, to directly identify the trait-associated CNVs without the need of identifying sample-specific CNVs. Asymptotic theory is developed to show that CNVtest controls the Type I error asymptotically and identifies the true trait-associated CNVs with a high probability. The performance of this method is demonstrated through simulations and an application to identify the CNVs that are associated with population differentiation. The second part of the dissertation develops methods for detecting genes with differential enrichment of histone modification between two or more experimental conditions based on the ChIP-seq data. We apply several nonparametric methods to identify the genes with differential enrichment. The methods can be applied to the ChIP-seq data of histone modification even without replicates. It is based on nonparametric hypothesis testing in order to capture the spatial differences in protein-enriched profiles. The key of our approaches is to use null genes or input ChIP-seq data to choose the biologically relevant null values of the tests. We demonstrate the method using ChIP-seq data on a comparative epigenomic profiling of adipogenesis of murine adipose stromal cells. Our method detects many genes with differential H3K27ac levels at gene promoter regions between proliferating preadipocytes and mature adipocytes in murine 3T3-L1 cells. The test statistics also correlate well with the gene expression changes and are predictive of gene expression changes, indicating that the identified differential enrichment regions are indeed biologically meaningful. We further extend these tests to time-course ChIP-seq experiments by evaluating the maximum and mean of the adjacent pair-wise statistics for detecting differentially enriched genes across several time points. We compare and evaluate different nonparametric tests for differential enrichment analysis and observe that the kernel-smoothing methods perform better in controlling the Type I errors, although the ranking of genes with differentially enriched regions are comparable using different test statistics.

Advisor
Hongzhe Li
Date of degree
2013-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation