Operations, Information and Decisions Papers

Document Type

Journal Article

Date of this Version

3-2005

Publication Source

Information Systems and e-Business Management

Volume

3

Issue

1

Start Page

1

Last Page

19

DOI

10.1007/s10257-003-0024-0

Abstract

Repeated play in games by simple adaptive agents is investigated. The agents use Q-learning, a special form of reinforcement learning, to direct learning of behavioral strategies in a number of 2×2 games. The agents are able effectively to maximize the total wealth extracted. This often leads to Pareto optimal outcomes. When the rewards signals are sufficiently clear, Pareto optimal outcomes will largely be achieved. The effect can select Pareto outcomes that are not Nash equilibria and it can select Pareto optimal outcomes among Nash equilibria.

Copyright/Permission Statement

The final publication is available at Springer via http://dx.doi.org/10.1007/s10257-003-0024-0

Keywords

Q-learning, algorithmic game theory, games, learning and games

Share

COinS
 

Date Posted: 27 November 2017

This document has been peer reviewed.