Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback

12/06/2021
by   Tianyi Lin, et al.
0

We consider online no-regret learning in unknown games with bandit feedback, where each agent only observes its reward at each time – determined by all players' current joint action – rather than its gradient. We focus on the class of smooth and strongly monotone games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct an online bandit convex optimization algorithm and show that it achieves the single-agent optimal regret of Θ̃(√(T)) under smooth and strongly-concave payoff functions. We then show that if each agent applies this no-regret learning algorithm in strongly monotone games, the joint action converges in last iterate to the unique Nash equilibrium at a rate of Θ̃(1/√(T)). Prior to our work, the best-know convergence rate in the same class of games is O(1/T^1/3) (achieved by a different algorithm), thus leaving open the problem of optimal no-regret learning algorithms (since the known lower bound is Ω(1/√(T))). Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning by identifying the first doubly optimal bandit learning algorithm, in that it achieves (up to log factors) both optimal regret in the single-agent learning and optimal last-iterate convergence rate in the multi-agent learning. We also present results on several simulation studies – Cournot competition, Kelly auctions, and distributed regularized logistic regression – to demonstrate the efficacy of our algorithm.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro