Univariate bandit convex optimization is the following sequential decision making problem. An adversary privately chooses a sequence of functions , where each is convex. Then, a randomized decision maker iteratively chooses a sequence of numbers , where each . On iteration , the decision maker chooses , observes the value , but does not receive any additional information about the function . The decision maker’s regret is defined as .
Define a computationally efficient randomized strategy (algorithm) for the decision maker that guarantees for any sequence of functions chosen by the adversary. Alternatively, prove that such a strategy does not exist.
The State of the Art
The algorithm due to  guarantees against any sequence of functions.  also gives a non-constructive proof that there exists an algorithm with against any sequence of functions, but this algorithm may be inefficient.
If the functions are all linear, we know that is either 0 or 1, so the problem reduces to a 2 armed adversarial bandit problem, for which (see ).
If each of the convex functions is strongly convex and differentiable with a Lipschitz derivative, then  gives an algorithm that guarantees against any sequence of functions.
Easier Open Problems
- Assume that each of the convex functions is 1-Lipschitz.
- Assume that each of the convex functions is differentiable with a Lipschitz derivative.
 P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The non-stochastic multi-armed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002.
 E. Hazan and K. Levy. Bandit convex optimization: Towards tight bounds. In Advances in Neural Information Processing Systems (NIPS), 2014.
 S. Bubeck, O. Dekel, T. Koren, and Y. Peres. Bandit Convex Optimization: Regret in One Dimension