Statistics Papers

Document Type

Journal Article

Date of this Version

1997

Publication Source

The Annals of Statistics

Volume

25

Issue

5

Start Page

2103

Last Page

2116

DOI

10.1214/aos/1069362389

Abstract

We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∞. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.

Keywords

bandit problems, sequential experimentation, dynamic allocation of Bernoulli processes, staying with a winner, switching with a loser

Share

COinS
 

Date Posted: 27 November 2017

This document has been peer reviewed.