Statistics Papers

Document Type

Journal Article

Date of this Version


Publication Source

Annals of Applied Statistics





Start Page


Last Page





In an optimal nonbipartite match, a single population is divided into matched pairs to minimize a total distance within matched pairs. Nonbipartite matching has been used to strengthen instrumental variables in observational studies of treatment effects, essentially by forming pairs that are similar in terms of covariates but very different in the strength of encouragement to accept the treatment. Optimal nonbipartite matching is typically done using network optimization techniques that can be quick, running in polynomial time, but these techniques limit the tools available for matching. Instead, we use integer programming techniques, thereby obtaining a wealth of new tools not previously available for nonbipartite matching, including fine and near-fine balance for several nominal variables, forced near balance on means and optimal subsetting. We illustrate the methods in our on-going study of outcomes of late-preterm births in California, that is, births of 34 to 36 weeks of gestation. Would lengthening the time in the hospital for such births reduce the frequency of rapid readmissions? A straightforward comparison of babies who stay for a shorter or longer time would be severely biased, because the principal reason for a long stay is some serious health problem. We need an instrument, something inconsequential and haphazard that encourages a shorter or a longer stay in the hospital. It turns out that babies born at certain times of day tend to stay overnight once with a shorter length of stay, whereas babies born at other times of day tend to stay overnight twice with a longer length of stay, and there is nothing particularly special about a baby who is born at 11:00 pm. Therefore, we use hour-of-birth as an instrument for a longer hospital stay. Using integer programming, we form 80,600 pairs of two babies who are similar in terms of observed covariates but very different in anticipated lengths of stay based on their hours of birth. We ask whether encouragement to stay an extra day reduces readmissions within two days of discharge. A sensitivity analysis addresses the possibility that the instrument is not valid as an instrument, that is, not random but rather biased by an unmeasured covariate associated with the hour of birth. Bias can give the impression of a treatment effect when there is no effect, but it can also mask an actual effect, leaving the impression of no effect, and both possibilities are examined in analyses for effects and for near equivalence.


attributable effect, equivalence test, fine balance, instrumental variable, integer programming, nonbipartite matching, observational study, optimal subset matching, sensitivity analysis



Date Posted: 27 November 2017

This document has been peer reviewed.