Some Upper Bounds on the Running Time of Policy Iteration on Deterministic MDPs

11/28/2022
by   Ritesh Goenka, et al.
0

Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms, and affirmation that a conjecture regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on certain graph-theoretic results, which may be of independent interest.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro