Date of Award
Doctor of Philosophy (PhD)
Epidemiology & Biostatistics
This dissertation addresses the statistical problems related to multiple-sample copy number variants (CNVs) analysis and analysis of differential enrichment of histone modifications (HMs) between two or more biological conditions based on the Chromatin Immunoprecipitation and sequencing (ChIP-seq) data. The first part of the dissertation develops methods for identifying the copy number variants that are associated with trait values. We develop a novel method, CNVtest, to directly identify the trait-associated CNVs without the need of identifying sample-specific CNVs. Asymptotic theory is developed to show that CNVtest controls the Type I error asymptotically and identifies the true trait-associated CNVs with a high probability. The performance of this method is demonstrated through simulations and an application to identify the CNVs that are associated with population differentiation.
The second part of the dissertation develops methods for detecting genes with differential enrichment of histone modification between two or more experimental conditions based on the ChIP-seq data. We apply several nonparametric methods to identify the genes with differential enrichment. The methods can be applied to the ChIP-seq data of histone modification even without replicates. It is based on nonparametric hypothesis testing in order to capture the spatial differences in protein-enriched profiles. The key of our approaches is to use null genes or input ChIP-seq data to choose the biologically relevant null values of the tests. We demonstrate the method using ChIP-seq data on a comparative epigenomic profiling of adipogenesis of murine adipose stromal cells. Our method detects many genes with differential H3K27ac levels at gene promoter regions between proliferating preadipocytes and mature adipocytes in murine 3T3-L1 cells. The test statistics also correlate well with the gene expression changes and are predictive of gene expression changes, indicating that the identified differential enrichment regions are indeed biologically meaningful.
We further extend these tests to time-course ChIP-seq experiments by evaluating the maximum and mean of the adjacent pair-wise statistics for detecting differentially enriched genes across several time points. We compare and evaluate different nonparametric tests for differential enrichment analysis and observe that the kernel-smoothing methods perform better in controlling the Type I errors, although the ranking of genes with differentially enriched regions are comparable using different test statistics.
Wu, Qian, "Statistical Methods for Analysis of Multi-Sample Copy Number Variants and ChIP-seq Data" (2013). Publicly Accessible Penn Dissertations. 948.