Learning Stochastic Shortest Path with Linear Function Approximation

10/25/2021
by   Yifei Min, et al.
11

We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems the linear mixture SSP. We propose a novel algorithm for learning the linear mixture SSP, which can attain a Õ(d B_⋆^1.5√(K/c_min)) regret. Here K is the number of episodes, d is the dimension of the feature mapping in the mixture model, B_⋆ bounds the expected cumulative cost of the optimal policy, and c_min>0 is the lower bound of the cost function. Our algorithm also applies to the case when c_min = 0, where a Õ(K^2/3) regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. In complement to the regret upper bounds, we also prove a lower bound of Ω(d B_⋆√(K)), which nearly matches our upper bound.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro