Date of this Version
Conference on Learning Theory
We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O*(√T)regret. The setting is a natural generalization of the nonstochastic multiarmed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered by previous approaches are overcome by the use of a self-concordant potential function. Our approach presents a novel connection between online learning and interior point methods.
Abernethy, J. D., Hazan, E., & Rakhlin, A. (2009). Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization. Conference on Learning Theory, Retrieved from https://repository.upenn.edu/statistics_papers/110
Date Posted: 27 November 2017