Problems In High-Dimensional Statistics And Applications In Genomics, Metabolomics And Microbiomics

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Epidemiology & Biostatistics
Discipline
Subject
Computational biology
Genomics
High-dimensional statistics
Metagenomics
Statistical decision theory
Biostatistics
Genetics
Statistics and Probability
Funder
Grant number
License
Copyright date
2022-09-09T20:21:00-07:00
Distributor
Related resources
Author
Ma, Rong
Contributor
Abstract

With rapid technological advancements in data collection and processing, massive large-scale and complex datasets are widely available nowadays in diverse research fields such as genomics, metabolomics and microbiomics. The analysis of large datasets with complex structures poses significant challenges and calls for new theory and methodology. In this dissertation, we address several high-dimensional statistical problems, and develop novel statistical theory and methods for analyzing datasets generated from such data-driven interdisciplinary research. In the first part of the dissertation (Chapter 1 and Chapter 2), motivated by the ubiquitous availability of high-dimensional datasets with binary outcomes and the need of powerful methods for analyzing them, we develop novel bias-correction techniques for inferring low-dimensional components or functionals of high-dimensional objects, and propose computationally efficient procedures for parameter estimation, global and simultaneous hypotheses testing, and confidence intervals in high-dimensional logistic regression(s). The theoretical properties of the proposed methods, including their minimax optimality, are carefully studied. We show empirically the effectiveness and stability of our methods in extracting useful information from high-dimensional noisy datasets. By applying our methods to a real metabolomic dataset, we unveil the associations between fecal metabolites and pediatric Crohn’s disease as well as the effects of dietary treatment on such associations (Chapter 1); by analyzing a real genetic dataset, we obtain novel insights about the shared genetic architecture between ten pediatric autoimmune diseases (Chapter 2). In the second part of the dissertation (Chapter 3 and Chapter 4), motivated by important questions in large-scale human microbiome and metagenomic research, as well as other applications, we propose a novel permuted monotone matrix model and build up new principles, theories and methods for inferring the underlying model parameters. In particular, we focus on two interrelated problems, namely, optimal permutation recovery from noisy observations (Chapter 3), and extreme value estimation in permuted low-rank monotone matrices (Chapter 4), and propose an efficient spectral approach to attack these problems. The proposed methods are rigorously justified by statistical theory, including their convergence rates and the minimax optimality. Numerical experiments through simulated and synthetic microbiome metagenomic data are presented to show the superiority of the proposed methods over the alternatives. The methods are applied to two real datasets to compare the growth rates of gut bacteria between inflammatory bowel disease patients and/or normal controls.

Advisor
Tony Cai
Hongzhe Li
Date of degree
2021-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation