Policy Search by Dynamic Programming
Loading...
Penn collection
Statistics Papers
Degree type
Discipline
Subject
Other Statistics and Probability
Statistics and Probability
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Bagnell, J. A
Kakade, Sham
Ng, Andrew Y
Schneider, Jeff G
Contributor
Abstract
We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.
Advisor
Date of presentation
2003-01-01
Conference name
Statistics Papers
Conference dates
2023-05-17T15:04:08.000