Closing the Gap Between Bandit and Full-Information Online Optimization: High-Probability Regret Bound

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Rakhlin, Alexander
Tewari, Ambuj
Bartlett, Peter L
Contributor
Abstract

We demonstrate a modification of the algorithm of Dani et al for the online linear optimization problem in the bandit setting, which allows us to achieve an O( √{T ln T} ) regret bound in high probability against an adaptive adversary, as opposed to the in expectation result against an oblivious adversary of Dani et al. We obtain the same dependence on the dimension (n3/2)as that exhibited by Dani et al. The results of this paper rest firmly on those of Dani et al and the remarkable technique of Auer et al for obtaining high-probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2007-08-26
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection