A Natural Policy Gradient

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
Biostatistics
Statistical Methodology
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Kakade, Sham M
Contributor
Abstract

We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2001-01-01
Journal title
Advances in Neural Information Processing Systems
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection