Inference Of Shared Genetic Architecture With Genome-Wide Association Data

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
genetic covariance
high-dimensional statistics
model misspecification
shared genetic architecture
Biostatistics
Genetics
Statistics and Probability
Funder
Grant number
License
Copyright date
2022-10-05T20:22:00-07:00
Distributor
Related resources
Author
Wang, Jianqiao
Contributor
Abstract

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits. Many complex traits and diseases share common genetic architecture. Studying the shared genetic architecture provides valuable insights into the underlying disease mechanisms. In this dissertation, we develop several statistical methods for investigating the shared genetic architecture based on GWAS data. We first discuss the quantification and estimation of the shared genetic architecture based on genetic covariance, which is defined as the underlying covariance of the genetic effects. We develop a unified approach to robust estimation and inference for genetic covariance of general outcomes that can be associated with genetic variants nonlinearly. The theoretical analysis shows that the proposed estimator is robust under certain model mis-specification. Various numerical experiments are performed to support the theoretical results. Application of this method to an outbred mice GWAS data set reveals interesting genetic covariance among different mice developmental traits. We then consider a practical challenge when the raw genotype data are unavailable, but only the GWAS summary association statistics are available. We develop a method of moments estimator of genetic correlation between two traits in the framework of high dimensional linear models. Theoretical properties of the estimator in terms of consistency and asymptotic normality are provided. Simulations and real data analysis results show that the proposed estimator is more robust and has better interpretability than the LD score regression method under different genetic architectures. Finally, in chapter 4 we discuss the problem of genome-wide detection and identification of shared genetic association, which is a global assessment of the existence of shared genetic architecture. The challenge is that the linkage disequilibrium (LD) between the SNPs makes test statistics highly dependent, which complicates the detection and identification. To account for such a dependency, an eigenvector-projected score statistic is proposed and a max-type test statistic (max-block) is developed for the genome-wide detection of shared associations. The max-block method is easy to calculate and is shown to control the genome-wide error rate. The method is applied to study shared cross-trait associations in 10 pediatric autoimmune diseases, leading to several regions that explain the genetic sharing between diseases.

Advisor
Hongzhe Li
Date of degree
2022-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation