# Find an Optimal Algorithm for Univariate Bandit Convex Optimization (Ofer Dekel)

Setting

Univariate bandit convex optimization is the following sequential decision making problem. An adversary privately chooses a sequence of functions $f_1,\ldots,f_T$, where each $f_t:[0,1] \mapsto [0,1]$ is convex. Then, a randomized decision maker iteratively chooses a sequence of numbers $x_1,\ldots,x_T$, where each $x_t \in [0,1]$. On iteration $t$, the decision maker chooses $x_t \in [0,1]$, observes the value $f_t(x_t)$, but does not receive any additional information about the function $f_t$. The decision maker’s regret is defined as $R(T) = \sum_{t=1}^T f_t(x_t) - \min_{x \in [0,1]} \sum_{t=1}^T f_t(x)$.

Open Problem

Define a computationally efficient randomized strategy (algorithm) for the decision maker that guarantees $R(T) = O(\sqrt{T})$ for any sequence of functions chosen by the adversary. Alternatively, prove that such a strategy does not exist.

The State of the Art

The algorithm due to [3] guarantees $R(T) = O(T^{2/3})$ against any sequence of functions. [3] also gives a non-constructive proof that there exists an algorithm with $R(T) = O(\sqrt{T})$ against any sequence of functions, but this algorithm may be inefficient.

If the functions are all linear, we know that $\arg\min_{x \in [0,1]} \sum_{t=1}^T f_t(x)$ is either 0 or 1, so the problem reduces to a 2 armed adversarial bandit problem, for which $R(T) = O(\sqrt{T})$ (see [1]).

If each of the convex functions is strongly convex and differentiable with a Lipschitz derivative, then [2] gives an algorithm that guarantees $R(T) = O(\sqrt{T})$ against any sequence of functions.

Easier Open Problems

• Assume that each of the convex functions is 1-Lipschitz.
• Assume that each of the convex functions is differentiable with a Lipschitz derivative.

Bibliography

[1] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The non-stochastic multi-armed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002.

[2] E. Hazan and K. Levy. Bandit convex optimization: Towards tight bounds. In Advances in Neural Information Processing Systems (NIPS), 2014.

[3] S. Bubeck, O. Dekel, T. Koren, and Y. Peres. Bandit Convex Optimization: $\sqrt{T}$ Regret in One Dimension