Policy Search by Dynamic Programming

Bagnell, J. A; Kakade, Sham; Ng, Andrew Y; Schneider, Jeff G

Policy Search by Dynamic Programming

Files

2378_policy_search_by_dynamic_programming.pdf (182.91 KB)

Penn collection

Statistics Papers

Subject

Other Statistics and Probability
Statistics and Probability

Permalink

https://repository.upenn.edu/handle/20.500.14332/47854

View all metadata

Author

Bagnell, J. A

Kakade, Sham

Ng, Andrew Y

Schneider, Jeff G

Abstract

We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.

Date of presentation

2003-01-01

Conference name

Statistics Papers

Conference dates

2023-05-17T15:04:08.000

Collection

Presentations