Instrumental Variables and Mendelian Randomization With Invalid Instruments

Kang, Hyunseung

Instrumental Variables and Mendelian Randomization With Invalid Instruments

Files

Kang_upenngdas_0175C_11682.pdf (1.14 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Statistics

Subject

Causal Inference
Econometrics
Instrumental Variables
Invalid Instruments
Mendelian Randomization
Biostatistics
Economics
Statistics and Probability

Copyright date

2015-07-20T20:15:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/27846

View all metadata

Author

Kang, Hyunseung

Abstract

Instrumental variables (IV) methods have been widely used to determine the causal effect of a treatment, exposure, policy, or an intervention on an outcome of interest. The IV method relies on having a valid instrument, a variable that is (A1) associated with the exposure, (A2) has no direct effect on the outcome, and (A3) is unrelated to the unmeasured confounders associated with the exposure and the outcome. However, in practice, finding a valid instrument, especially those that satisfy (A2) and (A3), can be challenging. For example, in Mendelian randomization studies where genetic markers are used as instruments, complete knowledge about instruments' validity is equivalent to complete knowledge about the involved genes' functions. The dissertation explores the theory, methods, and application of IV methods when invalid instruments are present. First, when we have multiple candidate instruments, we establish a theoretical bound whereby causal effects are only identified as long as less than 50% of instruments are invalid, without knowing which of the instruments are invalid. We also propose a fast penalized method, called sisVIVE, to estimate the causal effect. We find that sisVIVE outperforms traditional IV methods when invalid instruments are present both in simulation studies as well as in real data analysis. Second, we propose a robust confidence interval under the multiple invalid IV setting. This work is an extension of our work on sisVIVE. However, unlike sisVIVE which is robust to violations of (A2) and (A3), our confidence interval procedure provides honest coverage even if all three assumptions, (A1)-(A3), are violated. Third, we study the single IV setting where the one IV we have may actually be invalid. We propose a nonparametric IV estimation method based on full matching, a technique popular in causal inference for observational data, that leverages observed covariates to make the instrument more valid. We propose an estimator along with inferential results that are robust to mis-specifications of the covariate-outcome model. We also provide a sensitivity analysis should the instrument turn out to be invalid, specifically violate (A3). Fourth, in application work, we study the causal effect of malaria on stunting among children in Ghana. Previous studies on the effect of malaria and stunting were observational and contained various unobserved confounders, most notably nutritional deficiencies. To infer causality, we use the sickle cell genotype, a trait that confers some protection against malaria and was randomly assigned at birth, as an IV and apply our nonparametric IV method. We find that the risk of stunting increases by 0.22 (95% CI: 0.044,1) for every malaria episode and is sensitive to unmeasured confounders.

Advisor

Dylan S. Small
Tony Cai

Date of degree

2015-01-01

Collection

Dissertations and Theses