Methods for High Dimensional Inferences With Applications in Genomics

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
high-dimensional inferences
genome-wide association studies
gene regulation networks
Applied Statistics
Bioinformatics
Biostatistics
Statistical Models
Statistical Theory
Funder
Grant number
License
Copyright date
Distributor
Related resources
Contributor
Abstract

In this dissertation, I have developed several high dimensional inferences and computational methods motivated by problems in genomics studies. It consists of two parts. The first part is motivated by analysis of data from genome-wide association studies (GWAS), where I have developed an optimal false discovery rate (FDR) con- trolling method for high dimensional dependent data. For short-ranged dependent data, I have shown that the marginal plug-in procedure has the optimal property in controlling the FDR and minimizing the false non-discovery rate (FNR). When applied to analysis of the neuroblastoma GWAS data, this procedure identified six more disease-associated variants compared to previous p-value based procedures such as the Benjamini and Hochberg procedure. I have further investigated the statistical issue of sparse signal recovery in the setting of GWAS and developed a rigorous procedure for sample size and power analysis in the framework of FDR and FNR for GWAS. In addition, I have characterized the almost complete discovery boundary in terms of signal strength and non-null proportion and developed a procedure to achieve the almost complete recovery of the signals. The second part of my dissertation was motivated by gene regulation network construction based on the genetical genomics data (eQTL). I have developed a sparse high dimensional multivariate regression model for studying the conditional independent relationships among a set of genes adjusting for possible genetic effects, as well as the genetic architecture that influences the gene expression. I have developed a covariate adjusted precision matrix estimation method (CAPME), which can be easily implemented by linear programming. Asymptotic convergence rates and sign consistency are established for the estimators of the regression coefficients and the precision matrix. Numerical performance of the estimator was investigated using both simulated and real data sets. Simulation results have shown that the CAPME resulted in great improvements in both estimation and graph structure selection. I have applied the CAPME to analysis of a yeast eQTL data in order to identify the gene regulatory network among a set of genes in the MAPK signaling pathway. Finally, I have also made the R software package CAPME based on my dissertation work.

Advisor
Hongzhe Li
T. Tony Cai
Date of degree
2011-08-12
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation