Adversarial Multi-Player Bandits (Nicolò Cesa-Bianchi and Ohad Shamir)

Setting

Consider the following `multi-player’ version of adversarial multi-armed bandits, on three arms: An adversary chooses a sequence of T reward vectors x_t\in [0,1]^3. Then, two randomized decision makers iteratively choose i_{t},j_{t}\in \{1,2,3\}, and receive a reward g_t(i_t,j_t) equal to x_{t,i_t}+x_{t,j_t} if i_t\neq j_t, and 0 if i_t=j_t. This models situations where two users utilize the same set of resources, resulting in zero reward if they collide. Crucially, the decision makers cannot communicate.

Goal

Design an algorithm for the two decision makers which attains sublinear regret: Namely, that regardless of the adversary’s strategy, \max_{i,j}\sum_{t=1}^{T}g_t(i,j)-\sum_{t=1}^{T}g_t(i_t,j_t)= o(T).

Existing Results

Multi-player bandits have been studied in the context of cognitive radio networks, where users need to transmit on channels of varying quality and may interfere with each other (e.g. [1,2,3]). However, all results we are aware of are for a stochastic setting, where the rewards are sampled i.i.d., and most of them require a (possibly elaborate) communication scheme between the players, which is not always realistic.

Bibliography

[1] Animashree Anandkumar, Nithin Michael, Ao Kevin Tang, and Ananthram Swami. Distributed algorithms for learning and cognitive medium access with logarithmic regret. IEEE Journal on Selected Areas in Communications, 29(4):731–745, 2011.

[2] Orly Avner and Shie Mannor. Concurrent bandits and cognitive radio networks. In Machine Learning and Knowledge Discovery in Databases, pages 66–81. Springer, 2014.

[3] Dileep Kalathil, Naumaan Nayyar, and Rahul Jain. Decentralized learning for multiplayer multiarmed bandits. IEEE Transactions on Information Theory, 60(4):2331–2345, 2014.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s