Stochastic Convex Optimization With Bandit Feedback

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
derivative-free optimization
bandit optimization
ellipsoid method
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Agarwal, Alekh
Foster, Dean P
Hsu, Daniel
Kakade, Sham M
Rakhlin, Alexander
Contributor
Abstract

This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit (i.e., noisy zeroth-order) feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ X. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs O(poly(d) √T) regret. Since any algorithm has regret at least Ω(√T) on this problem, our algorithm is optimal in terms of the scaling with T.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2013-01-01
Journal title
SIAM Journal on Optimization
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection