Stochastic Convex Optimization With Bandit Feedback
Loading...
Penn collection
Statistics Papers
Degree type
Discipline
Subject
derivative-free optimization
bandit optimization
ellipsoid method
Statistics and Probability
bandit optimization
ellipsoid method
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Agarwal, Alekh
Foster, Dean P
Hsu, Daniel
Kakade, Sham M
Rakhlin, Alexander
Contributor
Abstract
This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit (i.e., noisy zeroth-order) feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ X. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs O(poly(d) √T) regret. Since any algorithm has regret at least Ω(√T) on this problem, our algorithm is optimal in terms of the scaling with T.
Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2013-01-01
Journal title
SIAM Journal on Optimization