Causal Inference with Two-Stage Logistic Regression - Accuracy, Precision, and Application

Loading...
Thumbnail Image

Degree type

Doctor of Philosophy (PhD)

Graduate group

Epidemiology & Biostatistics

Discipline

Subject

Causal Inference
Two-stage logistic regression
bias
variance
Applied Statistics
Biostatistics
Clinical Trials
Epidemiology
Medical Biomathematics and Biometrics
Statistical Methodology

Funder

Grant number

License

Copyright date

Distributor

Related resources

Author

Contributor

Abstract

Two-stage predictor substitution (2SPS) and the two-stage residual inclusion (2SRI) are two approaches to instrumental variable (IV) analysis. While 2SPS and 2SRI with linear models are well-studied methods of causal inference, the properties of 2SPS and 2SRI for logistic binary outcomes have not been thoroughly studied. We study the bias and variance properties of 2SPS and 2SRI for a logistic outcome model so that we can apply these IV approaches to the causal inference of binary outcomes. We also propose and implement an extension of generalized structure mean model originally developed for a randomized trial. We first present closed form expressions of asymptotic bias for the causal odds ratio from both 2SPS and 2SRI approaches. Our closed form bias results show that the 2SPS logistic regression generates asymptotically biased estimates of this causal odds ratio when there is no unmeasured confounding and that this bias increases with increasing unmeasured confounding. The 2SRI logistic regression is asymptotically unbiased when there is no unmeasured confounding, but when there is unmeasured confounding, there is bias and it increases with increasing unmeasured confounding. In the second part, we propose the sandwich variance estimator of logistic regression of both 2SPS and 2SRI approaches and the variance estimator is adjusted for the fact that the estimates from the first stage regression is included as covariates in the second stage regression. The simulation results show that the adjusted estimates are consistent with the observed variance while the naive estimates without the adjustments are biased. This study also shows that the 2SRI method has a larger variance than the 2SPS method. Lastly, we compare the 2SPS and 2SRI logistic regression with the generalized structure mean model (GSMM). Our simulation results show that the GSMM is an unbiased estimator of complier-average causal effect (CACE) and has the least variance among the three approaches. We apply these three methods to the analysis of the GPRD database on antidiabetic effect of bezafibrate.

Date of degree

2010-12-22

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Volume number

Issue number

Publisher

Publisher DOI

Journal Issues

Comments

Recommended citation