Beating the Adaptive Bandit With High Probability

Abernethy, Jacob; Rakhlin, Alexander

Beating the Adaptive Bandit With High Probability

Files

EECS_2009_10__1_.pdf (351.19 KB)

Penn collection

Statistics Papers

Subject

computational complexity optimisation
probability
set theory
adaptive bandit
arbitrary convex decision sets
high-probability bound
partial-information problems
sampling scheme
computer science
cost function
entropy
heart
Jacobian matrices
probability
sampling methods
state estimation
statistics
upper bound
Computer Sciences
Statistics and Probability

Permalink

https://repository.upenn.edu/handle/20.500.14332/47525

View all metadata

Author

Abernethy, Jacob

Rakhlin, Alexander

Abstract

We provide a principled way of proving Omacr(radicT) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the full-information problem in terms of ldquolocalrdquo norms, both for entropy and self-concordant barrier regularization, unifying these methods. Given one of such algorithms as a black-box, we can convert a bandit problem into a full-information problem using a sampling scheme. The main result states that a high-probability Omacr(radicT) bound holds whenever the black-box, the sampling scheme, and the estimates of missing information satisfy a number of conditions, which are relatively easy to check. At the heart of the method is a construction of linear upper bounds on confidence intervals. As applications of the main result, we provide the first known efficient algorithm for the sphere with an Omacr(radicT) high-probability bound. We also derive the result for the n-simplex, improving the O(radicnT log(nT)) bound of Auer et al [3] by replacing the log T term with log log T and closing the gap to the lower bound of Omacr(radicnT). While Omacr(radicT) high-probability bounds should hold for general decision sets through our main result, construction of linear upper bounds depends on the particular geometry of the set; we believe that the sphere example already exhibits the necessary ingredients. The guarantees we obtain hold for adaptive adversaries (unlike the in-expectation results of [1]) and the algorithms are efficient, given that the linear upper bounds on confidence can be computed.

Date of presentation

2009-02-01

Conference name

Statistics Papers

Conference dates

2023-05-17T15:24:28.000

Collection

Presentations