Stochastic Convex Optimization With Bandit Feedback

Agarwal, Alekh; Foster, Dean P; Hsu, Daniel; Kakade, Sham M; Rakhlin, Alexander

Stochastic Convex Optimization With Bandit Feedback

Files

110850827.pdf (890.84 KB)

Penn collection

Statistics Papers

Subject

derivative-free optimization
bandit optimization
ellipsoid method
Statistics and Probability

Permalink

https://repository.upenn.edu/handle/20.500.14332/47799

View all metadata

Author

Agarwal, Alekh

Foster, Dean P

Hsu, Daniel

Kakade, Sham M

Rakhlin, Alexander

Abstract

This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit (i.e., noisy zeroth-order) feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ X. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs O(poly(d) √T) regret. Since any algorithm has regret at least Ω(√T) on this problem, our algorithm is optimal in terms of the scaling with T.

Publication date

2013-01-01

Journal title

SIAM Journal on Optimization

Collection

Articles