Inference Of Shared Genetic Architecture With Genome-Wide Association Data

Wang, Jianqiao

Inference Of Shared Genetic Architecture With Genome-Wide Association Data

Files

Wang_upenngdas_0175C_15267.pdf (1.58 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Epidemiology & Biostatistics

Subject

genetic covariance
high-dimensional statistics
model misspecification
shared genetic architecture
Biostatistics
Genetics
Statistics and Probability

Copyright date

2022-10-05T20:22:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/32197

View all metadata

Author

Wang, Jianqiao

Abstract

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits. Many complex traits and diseases share common genetic architecture. Studying the shared genetic architecture provides valuable insights into the underlying disease mechanisms. In this dissertation, we develop several statistical methods for investigating the shared genetic architecture based on GWAS data. We first discuss the quantification and estimation of the shared genetic architecture based on genetic covariance, which is defined as the underlying covariance of the genetic effects. We develop a unified approach to robust estimation and inference for genetic covariance of general outcomes that can be associated with genetic variants nonlinearly. The theoretical analysis shows that the proposed estimator is robust under certain model mis-specification. Various numerical experiments are performed to support the theoretical results. Application of this method to an outbred mice GWAS data set reveals interesting genetic covariance among different mice developmental traits. We then consider a practical challenge when the raw genotype data are unavailable, but only the GWAS summary association statistics are available. We develop a method of moments estimator of genetic correlation between two traits in the framework of high dimensional linear models. Theoretical properties of the estimator in terms of consistency and asymptotic normality are provided. Simulations and real data analysis results show that the proposed estimator is more robust and has better interpretability than the LD score regression method under different genetic architectures. Finally, in chapter 4 we discuss the problem of genome-wide detection and identification of shared genetic association, which is a global assessment of the existence of shared genetic architecture. The challenge is that the linkage disequilibrium (LD) between the SNPs makes test statistics highly dependent, which complicates the detection and identification. To account for such a dependency, an eigenvector-projected score statistic is proposed and a max-type test statistic (max-block) is developed for the genome-wide detection of shared associations. The max-block method is easy to calculate and is shown to control the genome-wide error rate. The method is applied to study shared cross-trait associations in 10 pediatric autoimmune diseases, leading to several regions that explain the genetic sharing between diseases.

Advisor

Hongzhe Li

Date of degree

2022-01-01

Collection

Dissertations and Theses