Date of this Version
The Annals of Statistics
We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∞. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.
bandit problems, sequential experimentation, dynamic allocation of Bernoulli processes, staying with a winner, switching with a loser
Berry, D. A., Chen, R. W., Zame, A., Heath, D. C., & Shepp, L. A. (1997). Bandit Problems With Infinitely Many Arms. The Annals of Statistics, 25 (5), 2103-2116. http://dx.doi.org/10.1214/aos/1069362389
Date Posted: 27 November 2017
This document has been peer reviewed.