Optimal regret in a bandit problem with combinatorial structure (Laurent Massoulie)

In http://dl.acm.org/citation.cfm?id=2745868 we considered the following
bandit problem: an object with unknown type $\tau\in T$ is to be
disseminated to a subset of consumers $u\in U$; the benefit that  $u$ gets
from being exposed is $+1$ with probability $p_{u\tau}$ and $-1$
otherwise. The question is then to sequentially determine who to expose,
so that the set $V$ of users eventually exposed minimizes $C_0+C_1$, where
$C_0:=\sum_{v\in V}(1-2 p_{v\tau})^+$ is the spamming cost, and
$C_1:=\sum_{v\in U\setminus V}(2p_{v\tau}-1)^+$ is the missed opportunity
cost. We identified lower bounds on regrets in terms of the parameters
$n=|U|$, $t=|T|$, and $\delta=\inf_{u\in U,\tau\in T}|1/2-p_{u\tau}|$ in
several scenarios of interest and found that the so-called greedy Bayes
strategy (a variant of Thompson sampling) would match these lower bounds
in each of these scenarios.

This begs two questions:

(i) Beyond these specific scenarios, in general what function of the
matrix $(p_{u\tau})_{u\in U,\tau\in T}$ conditions the best possible
regret?

(ii) Do generic methods such as Thompson sampling or Greedy Bayes approach
the best possible regret more generally than in the specific scenarios
considered in http://dl.acm.org/citation.cfm?id=2745868 ?