In http://dl.acm.org/citation.cfm?id=2745868 we considered the following

bandit problem: an object with unknown type is to be

disseminated to a subset of consumers ; the benefit that gets

from being exposed is with probability and

otherwise. The question is then to sequentially determine who to expose,

so that the set of users eventually exposed minimizes , where

is the spamming cost, and

is the missed opportunity

cost. We identified lower bounds on regrets in terms of the parameters

, , and in

several scenarios of interest and found that the so-called greedy Bayes

strategy (a variant of Thompson sampling) would match these lower bounds

in each of these scenarios.

This begs two questions:

(i) Beyond these specific scenarios, in general what function of the

matrix conditions the best possible

regret?

(ii) Do generic methods such as Thompson sampling or Greedy Bayes approach

the best possible regret more generally than in the specific scenarios

considered in http://dl.acm.org/citation.cfm?id=2745868 ?