Hyperlinks lead to the arxiv version of the mentioned papers.
Let be a distribution over length sequences of functions from some compact set to . Let , , and be uniformly distributed in (and independent of ). A bandit strategy is a sequence of random variables where is measurable with respect to the bandit filtration . We are interested in the following quantity (where expectation is taken with respect to ): .
Thomspon Sampling (TS) is the strategy where has the same the same distribution as conditionally on . Here are some simple questions about TS.
Approximation properties of TS
Consider the special case where , and one can simulate by first sampling from some distribution and then sampling independently each from . Let be the infimum of over all bandit strategies for this setting. Guha and Munagala ask whether TS satisfies (more precisely they ask this question for a somewhat stronger notion a regret where one counts the number of times that a suboptimal action was selected). Note that in this setting we know that there exists a numerical constant such that (see this paper). Furthermore Guha and Munagala essentially prove that is true for .
TS and the geometry of the action set
Russo and van Roy showed that if is supported on linear functions, then TS satisfies . This essentially implies that on (the unit Euclidean ball in ) one has . However it is known that for this particular convex body there exists strategies with (see this paper). Is this latter bound true for TS? More generally understanding the interplay between TS and the geometry of the underlying action set is entirely open.
TS and convexity
In our recent work we showed that if is supported on convex functions, and if , then a small modification of TS leads to . Is this bound true for the basic TS? What about higher dimensions?