Problems In High-Dimensional Statistics And Applications In Genomics, Metabolomics And Microbiomics

dc.contributor.advisorTony Cai
dc.contributor.advisorHongzhe Li
dc.contributor.authorMa, Rong
dc.date2023-05-18T03:11:39.000
dc.date.accessioned2023-05-22T18:10:51Z
dc.date.available2001-01-01T00:00:00Z
dc.date.copyright2022-09-09T20:21:00-07:00
dc.date.issued2021-01-01
dc.date.submitted2022-09-09T08:06:03-07:00
dc.description.abstractWith rapid technological advancements in data collection and processing, massive large-scale and complex datasets are widely available nowadays in diverse research fields such as genomics, metabolomics and microbiomics. The analysis of large datasets with complex structures poses significant challenges and calls for new theory and methodology. In this dissertation, we address several high-dimensional statistical problems, and develop novel statistical theory and methods for analyzing datasets generated from such data-driven interdisciplinary research. In the first part of the dissertation (Chapter 1 and Chapter 2), motivated by the ubiquitous availability of high-dimensional datasets with binary outcomes and the need of powerful methods for analyzing them, we develop novel bias-correction techniques for inferring low-dimensional components or functionals of high-dimensional objects, and propose computationally efficient procedures for parameter estimation, global and simultaneous hypotheses testing, and confidence intervals in high-dimensional logistic regression(s). The theoretical properties of the proposed methods, including their minimax optimality, are carefully studied. We show empirically the effectiveness and stability of our methods in extracting useful information from high-dimensional noisy datasets. By applying our methods to a real metabolomic dataset, we unveil the associations between fecal metabolites and pediatric Crohn’s disease as well as the effects of dietary treatment on such associations (Chapter 1); by analyzing a real genetic dataset, we obtain novel insights about the shared genetic architecture between ten pediatric autoimmune diseases (Chapter 2). In the second part of the dissertation (Chapter 3 and Chapter 4), motivated by important questions in large-scale human microbiome and metagenomic research, as well as other applications, we propose a novel permuted monotone matrix model and build up new principles, theories and methods for inferring the underlying model parameters. In particular, we focus on two interrelated problems, namely, optimal permutation recovery from noisy observations (Chapter 3), and extreme value estimation in permuted low-rank monotone matrices (Chapter 4), and propose an efficient spectral approach to attack these problems. The proposed methods are rigorously justified by statistical theory, including their convergence rates and the minimax optimality. Numerical experiments through simulated and synthetic microbiome metagenomic data are presented to show the superiority of the proposed methods over the alternatives. The methods are applied to two real datasets to compare the growth rates of gut bacteria between inflammatory bowel disease patients and/or normal controls.
dc.description.degreeDoctor of Philosophy (PhD)
dc.format.extent141 p.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://repository.upenn.edu/handle/20.500.14332/31450
dc.languageen
dc.legacy.articleid6158
dc.legacy.fulltexturlhttps://repository.upenn.edu/cgi/viewcontent.cgi?article=6158&context=edissertations&unstamped=1
dc.provenanceReceived from ProQuest
dc.rightsRong Ma
dc.source.issue4372
dc.source.journalPublicly Accessible Penn Dissertations
dc.source.statuspublished
dc.subject.otherComputational biology
dc.subject.otherGenomics
dc.subject.otherHigh-dimensional statistics
dc.subject.otherMetagenomics
dc.subject.otherStatistical decision theory
dc.subject.otherBiostatistics
dc.subject.otherGenetics
dc.subject.otherStatistics and Probability
dc.titleProblems In High-Dimensional Statistics And Applications In Genomics, Metabolomics And Microbiomics
dc.typeDissertation/Thesis
digcom.contributor.authorMa, Rong
digcom.date.embargo2001-01-01T00:00:00-08:00
digcom.identifieredissertations/4372
digcom.identifier.contextkey31199202
digcom.identifier.submissionpathedissertations/4372
digcom.typedissertation
dspace.entity.typePublication
upenn.graduate.groupEpidemiology & Biostatistics
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ma_upenngdas_0175C_14729.pdf
Size:
10.45 MB
Format:
Adobe Portable Document Format