Stochastic Multi-armed Bandits in Constant Space

12/25/2017
by   David Liau, et al.
0

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all K arms. We give an algorithm using O(1) words of space with regret ∑_i=1^K1/Δ_iΔ_i/Δ T where Δ_i is the gap between the best arm and arm i and Δ is the gap between the best and the second-best arms. If the rewards are bounded away from 0 and 1, this is within an O( 1/Δ) factor of the optimum regret possible without space constraints.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro