Statistical Inference For High Dimensional Models In Genomics And Microbiome

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
Causal inference
Genomcis
High dimensional model
Microbiome
Statistical inference
Biostatistics
Funder
Grant number
License
Copyright date
2021-08-31T20:20:00-07:00
Distributor
Related resources
Author
Lu, Jiarui
Contributor
Abstract

Human microbiome consists of all living microorganisms that are in and on human body. Largescale microbiome studies such as the NIH Human Microbiome Project (HMP), have shown that this complex ecosystem has large impact on human health through multiple ways. The analysis of these datasets leads to new statistical challenges that require the development of novel methodologies. Motivated by several microbiome studies, we develop several methods of statistical inference for high dimensional models to address the association between microbiome compositions and certain outcomes. The high-dimensionality and compositional nature of the microbiome data make the naive application of the classical regression models invalid. To study the association between microbiome compositions with a disease’s risk, we develop a generalized linear model with linear constraints on regression coefficients and a related debiased procedure to obtain asymptotically unbiased and normally distributed estimates. Application of this method to an inflammatory bowel disease (IBD) study identifies several gut bacterial species that are associated with the risk of IBD. We also consider the post-selection inference for models with linear equality constraints, where we develop methods for constructing the confidence intervals for the selected non-zero coefficients chosen by a Lasso-type estimator with linear constraints. These confidence intervals are shown to have desired coverage probabilities when conditioned on the selected model. Finally, the last chapter of this dissertation presents a method for inference of high dimensional instrumental variable regression. Gene expression and phenotype association can be affected by potential unmeasured confounders, leading to biased estimates of the associations. Using genetic variants as instruments, we consider the problem of hypothesis testing for sparse IV regression models and present methods for testing both single and multiple regression coefficients. A multiple testing procedure is developed for selecting variables and is shown to control the false discovery rate. These methods are illustrated by an analysis of a yeast dataset in order to identify genes that are associated with growth in the presence of hydrogen peroxide.

Advisor
Hongzhe Li
Date of degree
2020-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation