Statistics Papers

Document Type

Conference Paper

Date of this Version

2009

Publication Source

Conference on Learning Theory

Abstract

We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O*(√T)regret. The setting is a natural generalization of the nonstochastic multiarmed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered by previous approaches are overcome by the use of a self-concordant potential function. Our approach presents a novel connection between online learning and interior point methods.

Comments

At the time of publication, author Alexander Rakhlin was affiliated with the University of California, Berkeley. Currently, he is a faculty member at the Statistics Department at the University of Pennsylvania.

Share

COinS
 

Date Posted: 27 November 2017