Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

01/28/2022
by   Jiatai Huang, et al.
0

In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have α-th (1<α≤ 2) moments bounded by σ^α, while the variances may not exist. Specifically, we design an algorithm , when the heavy-tail parameters α and σ are known to the agent, simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When α,σ are unknown, achieves a log T-style instance-dependent regret in stochastic cases and o(T) no-regret guarantee in adversarial cases. We further develop an algorithm , achieving 𝒪(σ K^1-1/αT^1/α) minimax optimal regret even in adversarial settings, without prior knowledge on α and σ. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and α and σ are both known. To our knowledge, the proposed algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and is the first algorithm that can adapt to both α and σ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro