Date of Award

Summer 2011

Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Epidemiology & Biostatistics

First Advisor

Mingyao Li

Second Advisor

Hongzhe Li


Genome-wide association studies have become a standard tool for
disease gene discovery over the past few years. These studies have
successfully identified genetic variants attributed to complex
diseases, such as cardiovascular disease, diabetes and cancer.
Various statistical methods have been developed with the goal of
improving power to find disease causing variants. The major focus of
this dissertation is to develop statistical methods related to gene
mapping studies with its application in real datasets to identify
genetic markers associated with complex human diseases.

In my first project, I developed a method to detect gene-gene
interactions by incorporating linkage disequilibrium (LD)
information provided by external datasets such as the International
HapMap or the 1000 Genomes Projects. The next two projects in my
dissertation are related to the analysis of secondary phenotypes in
case-control genetic association studies. In these studies, a set of
correlated secondary phenotypes that may share common genetic
factors with disease status are often collected. However, due to
unequal sampling probabilities between cases and controls, the
standard regression approach for examination of these secondary
phenotype can yield inflated type I error rates when the test SNPs
are associated with the disease. To solve this issue, I propose a
Gaussian copula approach to jointly model the disease status and the
secondary phenotype. In my second project, I consider only one
marker in the model and perform a test to access whether the marker
is associated with the secondary phenotype in the Gaussian copula
framework. In my third project, I extend the copula-based approach
to include a large number of candidate SNPs in the model. I propose
a variable selection approach to select markers which are associated
with the secondary phenotype by applying a lasso penalty to the
log-likelihood function.