Beating the Adaptive Bandit With High Probability

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
computational complexity optimisation
probability
set theory
adaptive bandit
arbitrary convex decision sets
high-probability bound
partial-information problems
sampling scheme
computer science
cost function
entropy
heart
Jacobian matrices
probability
sampling methods
state estimation
statistics
upper bound
Computer Sciences
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Abernethy, Jacob
Rakhlin, Alexander
Contributor
Abstract

We provide a principled way of proving Omacr(radicT) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the full-information problem in terms of ldquolocalrdquo norms, both for entropy and self-concordant barrier regularization, unifying these methods. Given one of such algorithms as a black-box, we can convert a bandit problem into a full-information problem using a sampling scheme. The main result states that a high-probability Omacr(radicT) bound holds whenever the black-box, the sampling scheme, and the estimates of missing information satisfy a number of conditions, which are relatively easy to check. At the heart of the method is a construction of linear upper bounds on confidence intervals. As applications of the main result, we provide the first known efficient algorithm for the sphere with an Omacr(radicT) high-probability bound. We also derive the result for the n-simplex, improving the O(radicnT log(nT)) bound of Auer et al [3] by replacing the log T term with log log T and closing the gap to the lower bound of Omacr(radicnT). While Omacr(radicT) high-probability bounds should hold for general decision sets through our main result, construction of linear upper bounds depends on the particular geometry of the set; we believe that the sphere example already exhibits the necessary ingredients. The guarantees we obtain hold for adaptive adversaries (unlike the in-expectation results of [1]) and the algorithms are efficient, given that the linear upper bounds on confidence can be computed.

Advisor
Date of presentation
2009-02-01
Conference name
Statistics Papers
Conference dates
2023-05-17T15:24:28.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection