**Setting**

Univariate bandit convex optimization is the following sequential decision making problem. An adversary privately chooses a sequence of functions , where each is convex. Then, a randomized decision maker iteratively chooses a sequence of numbers , where each . On iteration , the decision maker chooses , observes the value , but does not receive any additional information about the function . The decision maker’s regret is defined as .

**Open Problem**

Define a computationally efficient randomized strategy (algorithm) for the decision maker that guarantees for any sequence of functions chosen by the adversary. Alternatively, prove that such a strategy does not exist.

**The State of the Art**

The algorithm due to [3] guarantees against any sequence of functions. [3] also gives a non-constructive proof that there exists an algorithm with against any sequence of functions, but this algorithm may be inefficient.

If the functions are all linear, we know that is either 0 or 1, so the problem reduces to a 2 armed adversarial bandit problem, for which (see [1]).

If each of the convex functions is strongly convex and differentiable with a Lipschitz derivative, then [2] gives an algorithm that guarantees against any sequence of functions.

**Easier Open Problems**

- Assume that each of the convex functions is 1-Lipschitz.
- Assume that each of the convex functions is differentiable with a Lipschitz derivative.

**Bibliography**

[1] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The non-stochastic multi-armed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002.

[2] E. Hazan and K. Levy. Bandit convex optimization: Towards tight bounds. In Advances in Neural Information Processing Systems (NIPS), 2014.

[3] S. Bubeck, O. Dekel, T. Koren, and Y. Peres. Bandit Convex Optimization: Regret in One Dimension