A Natural Policy Gradient

Kakade, Sham M

A Natural Policy Gradient

Files

2073_a_natural_policy_gradient.pdf (1.45 MB)

Penn collection

Statistics Papers

Subject

Biostatistics
Statistical Methodology
Statistics and Probability

Permalink

https://repository.upenn.edu/handle/20.500.14332/47861

View all metadata

Author

Kakade, Sham M

Abstract

We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.

Publication date

2001-01-01

Journal title

Advances in Neural Information Processing Systems

Collection

Articles