Date of this Version
SIAM Journal on Optimization
This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit (i.e., noisy zeroth-order) feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ X. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs O(poly(d) √T) regret. Since any algorithm has regret at least Ω(√T) on this problem, our algorithm is optimal in terms of the scaling with T.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
derivative-free optimization, bandit optimization, ellipsoid method
Agarwal, A., Foster, D. P., Hsu, D., Kakade, S. M., & Rakhlin, A. (2013). Stochastic Convex Optimization With Bandit Feedback. SIAM Journal on Optimization, 23 (1), 213-240. http://dx.doi.org/10.1137/110850827
Date Posted: 27 November 2017
This document has been peer reviewed.