Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Epidemiology & Biostatistics

First Advisor

Hongzhe Lee


In microbiome studies, 16S rRNA sequencing is commonly used to quantify the taxonomic abun- dance of a microbial community. The resulting data are counts of amplicons. However, the total count is not informative because of the sampling, sample preparation and sequencing processes. These counts are used to obtain estimates of the relative abundance of the taxa, which is com- positional with a unit sum constraint. Analysis of compositional data requires special statistical treatment to account for the intrinsic dependence of the components due to this constraint. Bal- ance, defined as the normalized log ratio of the geometric mean of the values for the two groups of components, provides an interesting way of studying microbial community structure, where the two groups represent the beneficial and detrimental taxa, respectively. Such a balance can be used to quantify dysbiosis of the microbial community that is associated with a clinical outcome. However, identification of the outcome-associated balance is challenging. We introduce a Bayesian balance- regression and a Markov Chain Monte Carlo (MCMC) stochastic search algorithm to identify the compositional balance that is associated with the outcome. Specifically, we propose a random walk strategy in MCMC that explores the very large space of all possible balance defined from high dimensional compositional vector. Simulation studies suggest that the algorithm can identify the bacterial taxa that define the outcome-associated balance with a high probability. The effect of the balance on the outcome can be easily inferred from their predictive posterior distribution. We apply the proposed methods to two human microbiome studies and identify the balance of gut microbiome composition that are associated with body mass index and risk of inflammatory bowel disease, respectively.

Microbial compositional balance can also be used to define a mediator to link treatment or environ- ment factor to an outcome. However, for a given study, the balance that mediates the treatment effect on outcome is unknown. We propose a Bayesian balance mediation model and a Markov chain Monte Carlo (MCMC) method to simultaneously search for such a balance and to make inference on the mediation effects based on the predictive posterior distributions. Based on the proposed model, we show that the mediation effect can be defined in terms of balance effect on the outcome, balance indicator and the effect of treatment on compositional shift. Our simulation results show that the MCMC sampling can effectively identify the balance and provide correct estimate of the direct and mediation effects. We apply the method to a microbiome study aiming to understand the role of gut microbiome in linking vegan diet to several plasma metabolites. Our analysis shows that vegan diet has strong direct effects and the compositional balance identified has a weak to moderate effect on these plasma metabolites, however, the mediation effects of gut microbiome on these metabolites are very small.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."