Find an Optimal Algorithm for Univariate Bandit Convex Optimization (Ofer Dekel)

Setting

Univariate bandit convex optimization is the following sequential decision making problem. An adversary privately chooses a sequence of functions f_1,\ldots,f_T, where each f_t:[0,1] \mapsto [0,1] is convex. Then, a randomized decision maker iteratively chooses a sequence of numbers x_1,\ldots,x_T, where each x_t \in [0,1]. On iteration t, the decision maker chooses x_t \in [0,1], observes the value f_t(x_t), but does not receive any additional information about the function f_t. The decision maker’s regret is defined as R(T) = \sum_{t=1}^T f_t(x_t) - \min_{x \in [0,1]} \sum_{t=1}^T f_t(x).

Open Problem

Define a computationally efficient randomized strategy (algorithm) for the decision maker that guarantees R(T) = O(\sqrt{T}) for any sequence of functions chosen by the adversary. Alternatively, prove that such a strategy does not exist.

The State of the Art

The algorithm due to [3] guarantees R(T) = O(T^{2/3}) against any sequence of functions. [3] also gives a non-constructive proof that there exists an algorithm with R(T) = O(\sqrt{T}) against any sequence of functions, but this algorithm may be inefficient.

If the functions are all linear, we know that \arg\min_{x \in [0,1]} \sum_{t=1}^T f_t(x) is either 0 or 1, so the problem reduces to a 2 armed adversarial bandit problem, for which R(T) = O(\sqrt{T}) (see [1]).

If each of the convex functions is strongly convex and differentiable with a Lipschitz derivative, then [2] gives an algorithm that guarantees R(T) = O(\sqrt{T}) against any sequence of functions.

Easier Open Problems

  • Assume that each of the convex functions is 1-Lipschitz.
  • Assume that each of the convex functions is differentiable with a Lipschitz derivative.

Bibliography

[1] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The non-stochastic multi-armed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002.

[2] E. Hazan and K. Levy. Bandit convex optimization: Towards tight bounds. In Advances in Neural Information Processing Systems (NIPS), 2014.

[3] S. Bubeck, O. Dekel, T. Koren, and Y. Peres. Bandit Convex Optimization: \sqrt{T} Regret in One Dimension

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s