Balancing Multiple Goals in Observational Study Design

Samuel D Pimentel, University of Pennsylvania


This thesis unites three papers discussing new strategies for matched pair designs using observational data, developed to balance the demands of various disparate design goals. The first chapter introduces a new matching algorithm for large-scale treated-control comparisons when many categorical covariates are present. The algorithm balances covariates and their interactions in a prioritized manner by solving a combinatorial optimization problem, and guarantees computational efficiency through the use of a sparse network representation. The second chapter defines a class of variables called prods which can be ignored when matching in order to strictly attenuate unmeasured bias, if it is present. These variables can be difficult to identify with confidence, so a multiple-control-group strategy is proposed in which investigators match once on all variables, and once ignoring prods; the two treated-control comparisons together give stronger evidence about treatment effects than either one individually. The final paper considers a new version of Fisher's classical lack-of-fit test for regression models, appropriate for data that lack replicated observations. The test uses matched pairs formed by optimal nonbipartite matching as near-replicates, and the model fit is used is used in constructing the matching distance in order to focus attention on variables that are predictive in the null model.

Subject Area


Recommended Citation

Pimentel, Samuel D, "Balancing Multiple Goals in Observational Study Design" (2017). Dissertations available from ProQuest. AAI10269282.