Model-free policy evaluation in Reinforcement Learning via upper solutions

05/05/2021
by   D. Belomestny, et al.
0

In this work we present an approach for building tight model-free confidence intervals for the optimal value function V^⋆ in general infinite horizon MDPs via the upper solutions. We suggest a novel upper value iterative procedure (UVIP) to construct upper solutions for a given agent's policy. UVIP leads to a model free method of policy evaluation. We analyze convergence properties of the approximate UVIP under rather general assumptions and illustrate its performance on a number of benchmark RL problems.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro