Bayesian Nonparametric Models For Causal Inference And Clustering Under Dirichlet Process Priors
This body of work develops new Bayesian nonparametric (BNP) models for estimating causal effects with observational data. Though broadly applicable, it is motivated by statistical complexities that frequently arise in health economics. Using potential outcomes, we formulate tailored causal estimands and determine the conditions under which they are identifiable from observed data. Once identified, flexible estimation follows from constructing models with high-dimensional sets of parameters that are allowed to grow with the sample size. We employ the Dirichlet Process (DP), and related stochastic processes, as priors over these high-dimensional spaces to do posterior causal inference. First, motivated by complexities in medical cost distributions, we construct a generative two-part model for zero-inflated outcomes under a DP prior. This model is able to capture structural zeros, skewness, and multimodality. We develop a Bayesian g-computation procedure for causal estimation and use the induced partitioning of the DP to detect latent clusters of patients with similar data distributions. Second, we extend this work to cost-effectiveness analyses, which requires jointly modeling a bivariate outcome under right-censoring. Posterior causal inference is done using a BNP joint model under the Enriched DP and Gamma Process priors. Finally, we tackle the difficulties of estimating causal effects in multiple sparse subgroups. Using an improper Hierarchical DP, we construct a new ``hierarchical Bayesian bootstrap'' prior that partially pools confounder information across subgroups when performing g-computation. This allows for potential efficiency gains without imposing strong parametric assumptions on the confounder distributions. A key contribution throughout is the construction of Markov Chain Monte Carlo (MCMC) algorithms for efficient posterior sampling.