On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

11/14/2022
by   Mudit Gaur, et al.
0

Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parametrization, and find the sample complexity guarantees for the algorithm. The approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of 𝒪̃(1/ϵ^2), which is order-optimal. This result holds for a countable state-space and does not require any assumptions such as a linear or low rank structure on the MDP.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro