Date of this Version
Information Systems and e-Business Management
Repeated play in games by simple adaptive agents is investigated. The agents use Q-learning, a special form of reinforcement learning, to direct learning of behavioral strategies in a number of 2×2 games. The agents are able effectively to maximize the total wealth extracted. This often leads to Pareto optimal outcomes. When the rewards signals are sufficiently clear, Pareto optimal outcomes will largely be achieved. The effect can select Pareto outcomes that are not Nash equilibria and it can select Pareto optimal outcomes among Nash equilibria.
The final publication is available at Springer via http://dx.doi.org/10.1007/s10257-003-0024-0
Q-learning, algorithmic game theory, games, learning and games
Kimbrough, S. O., & Lu, M. (2005). Simple Reinforcement Learning Agents: Pareto Beats Nash in an Algorithmic Game Theory Study. Information Systems and e-Business Management, 3 (1), 1-19. http://dx.doi.org/10.1007/s10257-003-0024-0
Date Posted: 27 November 2017
This document has been peer reviewed.