Model Based Residual Policy Learning with Applications to Antenna Control

Non-differentiable controllers and rule-based policies are widely used for controlling real systems such as robots and telecommunication networks. In this paper, we present a practical reinforcement learning method which improves upon such existing policies with a model-based approach for better sample efficiency. Our method significantly outperforms state-of-the-art model-based methods, in terms of sample efficiency, on several widely used robotic benchmark tasks. We also demonstrate the effectiveness of our approach on a control problem in the telecommunications domain, where model-based methods have not previously been explored. Experimental results indicate that a strong initial performance can be achieved and combined with improved sample efficiency. We further motivate the design of our algorithm with a theoretical lower bound on the performance.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro