Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

01/28/2022

∙

In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have α-th (1<α≤ 2) moments bounded by σ^α, while the variances may not exist. Specifically, we design an algorithm , when the heavy-tail parameters α and σ are known to the agent, simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When α,σ are unknown, achieves a log T-style instance-dependent regret in stochastic cases and o(T) no-regret guarantee in adversarial cases. We further develop an algorithm , achieving 𝒪(σ K^1-1/αT^1/α) minimax optimal regret even in adversarial settings, without prior knowledge on α and σ. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and α and σ are both known. To our knowledge, the proposed algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and is the first algorithm that can adapt to both α and σ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.

READ FULL TEXT

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

Sign in with Google

Consider DeepAI Pro