Removing Strong Data Assumptions In Causal Inference Via Large-Scale Optimization

Heng, Siyu

Removing Strong Data Assumptions In Causal Inference Via Large-Scale Optimization

Files

Heng_upenngdas_0175C_15150.pdf (1.15 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Applied Mathematics

Subject

Applied Mathematics
Biostatistics
Statistics and Probability

Copyright date

2022-10-05T20:22:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/32195

View all metadata

Author

Heng, Siyu

Abstract

Many traditional and newly-developed causal inference approaches require imposing strong data assumptions, and if those assumptions were violated in practice, these approaches may be inapplicable, suffer from low statistical power, or lead to misleading causal conclusions. In this dissertation, we present three papers to show how large-scale optimization can sometimes aid in removing strong assumptions about the data generating process or the data collection procedure that are required by some existing causal inference approaches. The first and second papers show how large-scale optimization can sometimes help remove strong assumptions about the data generating process. In the first paper, a new adaptive approach is proposed to combine two test statistics in matched observational studies. The proposed adaptive approach asymptotically uniformly dominates both of the two component test statistics in sensitivity analyses, regardless of the underlying data distribution. In the second paper, a model-free and finite-population-exact framework is proposed to analyze randomized experiments subject to outcome misclassification. This new framework is based on large-scale integer programming and can help researchers analyze a randomized experiment subject to outcome misclassification in a more comprehensive way without imposing any additional assumptions on a randomized experiment. The third paper illustrates how large-scale optimization can help remove strong assumptions about the data collection procedure. Specifically, to study the effect of reducing malaria burden on the low birth weight rate in sub-Saharan Africa, a pair-of-pairs approach to a difference-in-differences study is proposed, which is built on optimal matching (a large-scale network flow problem) and cardinality matching (a large-scale integer programming problem). Unlike the traditional difference-in-differences studies, this pair-of-pairs approach does not require either panel data or repeated cross-sectional data to be collected before the analysis stage.

Advisor

Dylan S. Small

Date of degree

2022-01-01

Collection

Dissertations and Theses