Bayesian Nonparametric Models For Causal Inference And Clustering Under Dirichlet Process Priors

Oganisian, Arman

Bayesian Nonparametric Models For Causal Inference And Clustering Under Dirichlet Process Priors

Files

Oganisian_upenngdas_0175C_14656.pdf (1.36 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Statistics

Subject

Bayesian nonparametrics
Bayesian Statistics
Causal Inference
Clustering
Dirichlet Process
Nonparametric Modeling
Biostatistics

Copyright date

2021-08-31T20:21:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/30852

View all metadata

Author

Oganisian, Arman

Abstract

This body of work develops new Bayesian nonparametric (BNP) models for estimating causal effects with observational data. Though broadly applicable, it is motivated by statistical complexities that frequently arise in health economics. Using potential outcomes, we formulate tailored causal estimands and determine the conditions under which they are identifiable from observed data. Once identified, flexible estimation follows from constructing models with high-dimensional sets of parameters that are allowed to grow with the sample size. We employ the Dirichlet Process (DP), and related stochastic processes, as priors over these high-dimensional spaces to do posterior causal inference. First, motivated by complexities in medical cost distributions, we construct a generative two-part model for zero-inflated outcomes under a DP prior. This model is able to capture structural zeros, skewness, and multimodality. We develop a Bayesian g-computation procedure for causal estimation and use the induced partitioning of the DP to detect latent clusters of patients with similar data distributions. Second, we extend this work to cost-effectiveness analyses, which requires jointly modeling a bivariate outcome under right-censoring. Posterior causal inference is done using a BNP joint model under the Enriched DP and Gamma Process priors. Finally, we tackle the difficulties of estimating causal effects in multiple sparse subgroups. Using an improper Hierarchical DP, we construct a new ``hierarchical Bayesian bootstrap'' prior that partially pools confounder information across subgroups when performing g-computation. This allows for potential efficiency gains without imposing strong parametric assumptions on the confounder distributions. A key contribution throughout is the construction of Markov Chain Monte Carlo (MCMC) algorithms for efficient posterior sampling.

Advisor

Nandita Mitra
Jason A. Roy

Date of degree

2021-01-01

Collection

Dissertations and Theses