
Statistics Papers
Document Type
Conference Paper
Date of this Version
2002
Publication Source
Proceedings of the Nineteenth International Conference on Machine Learning
Start Page
339
Last Page
346
Abstract
We investigate the explore/exploit trade-off in reinforcement learning using competitive analysis applied to an abstract model. We state and prove lower and upper bounds on the competitive ratio. The essential conclusion of our analysis is that optimizing the explore/exploit trade-off is much easier with a few pieces of extra knowledge such as the stopping time or upper and lower bounds on the value of the optimal exploitation policy.
Recommended Citation
Langford, J., Zinkevich, M., & Kakade, S. M. (2002). Competitive Analysis of the Explore/Exploit Tradeoff. Proceedings of the Nineteenth International Conference on Machine Learning, 339-346. Retrieved from https://repository.upenn.edu/statistics_papers/113
Date Posted: 27 November 2017