Schedule Based Temporal Difference Algorithms

11/23/2021
by   Rohan Deb, et al.
0

Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD(λ) is a popular class of algorithms to solve this problem. However, the weights assigned to different n-step returns in TD(λ), controlled by the parameter λ, decrease exponentially with increasing n. In this paper, we present a λ-schedule procedure that generalizes the TD(λ) algorithm to the case when the parameter λ could vary with time-step. This allows flexibility in weight assignment, i.e., the user can specify the weights assigned to different n-step returns by choosing a sequence {λ_t}_t ≥ 1. Based on this procedure, we propose an on-policy algorithm - TD(λ)-schedule, and two off-policy algorithms - GTD(λ)-schedule and TDC(λ)-schedule, respectively. We provide proofs of almost sure convergence for all three algorithms under a general Markov noise framework.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro