# Optimal regret in a bandit problem with combinatorial structure (Laurent Massoulie)

In http://dl.acm.org/citation.cfm?id=2745868 we considered the following

bandit problem: an object with unknown type is to be

disseminated to a subset of consumers ; the benefit that gets

from being exposed is with probability and

otherwise. The question is then to sequentially determine who to expose,

so that the set of users eventually exposed minimizes , where

is the spamming cost, and

is the missed opportunity

cost. We identified lower bounds on regrets in terms of the parameters

, , and in

several scenarios of interest and found that the so-called greedy Bayes

strategy (a variant of Thompson sampling) would match these lower bounds

in each of these scenarios.

This begs two questions:

(i) Beyond these specific scenarios, in general what function of the

matrix conditions the best possible

regret?

(ii) Do generic methods such as Thompson sampling or Greedy Bayes approach

the best possible regret more generally than in the specific scenarios

considered in http://dl.acm.org/citation.cfm?id=2745868 ?

# Clustering cheaply under local differential privacy (Laurent Massoulie)

The trade-off between privacy preservation and statistical inference takes

interesting forms in the context of recommendation engines. In

http://arxiv.org/abs/1207.3269 we considered the problem of inferring

clusters of similar items from inputs of users under an extreme form of

differential privacy, namely local differential privacy. Local DP mandates

each user to only provide input that is distorted by noise so that the

underlying ratings of items by that user (as well as which items it rated)

remain ambiguous. We assessed performance of clustering schemes under

these constraints by query complexity. This counts how many users are

probed before a cluster structure underlying users’s ratings can be

reliably identified. Assuming users rate a small number of items chosen

uniformly at random, the best query complexity we could achieve is an

impracticably large where is the number of items.

However the best lower bounds we know are in .

This begs the question of determining whether in this setup quasi-linear

query complexity suffices to successfully cluster, and if so, what scheme

achieves such complexity?

# Better Guarantees for Sparsest Cut Clustering (Nina Balcan)

The problem description can be found here.

# Minimax off-policy value estimation (Lihong Li)

In stochastic contextual bandits, one is often interested in estimating the average per-step reward by running a given policy from data collected by a different policy. The problem is also known as off-policy learning in reinforcement learning and covariate shift in statistics, among others.

Precisely, a contextual -armed bandit is specified by a distribution over context set and a reward distribution . Given a stochastic policy , its average reward is given by

The data available to an estimator is of the form , where , , , and is a data-collection policy different from . The problem is to estimate from data .

A natural solution is to use importance sampling to get an unbiased estimate of , whose variance depends on as well as how “close” and are. While this approach has been successful empirically, it is not necessarily optimal, as shown in an asymptotic result [1]. More recently, it is shown that for the special case where is singleton (ie, the classic -armed bandit), a less popular approach called regression estimator is near-optimal (and asymptotically optimal) in a minimax sense, although the importance-sampling estimator can be arbitrarily sub-optimal [2].

The open question involves proposing a (near-)optimal estimator for general contextual bandits.

[1] Keisuke Hirano, Guido W. Imbens, and Geert Ridder. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71(4):1161–1189, 2003.

[2] Lihong Li, Remi Munos, and Csaba Szepesvari. Toward minimax off-policy value estimation. In AI&Stats 2015.

# Active Learning for Maximum Likelihood Estimation with Model Mismatch (Kamalika Chaudhuri)

An active learner is given a model class, a large number of unlabeled samples, and the ability to interactively query labels of a subset of these samples. The goal of the learner is to provide a model in the model class that fits the data best by making as few interactive label queries as possible. An example is active regression; here, the model class is the set of linear models , and the corresponding labels are generated as follows:

When there is no model mismatch — that is, when the labels are indeed generated from a model in the input model class — the active learning problem is relatively easy, and has been addressed in a fairly general context in this paper:

http://arxiv.org/abs/1506.02348

What happens when the labels are not necessarily generated by a model in the input model class? This paper gives an approach for linear regression:

http://arxiv.org/abs/1410.5920

by using a piecewise partitioning over the data domain, and learning a series of importance weights that are constant over each piece of the partition. The importance weights are adjusted so that weighted MLE estimates have optimal (upper bound on) risk.

Can we find a better and more efficient way for active learning for MLE with model mismatch?

# Differentially Private Model Selection (Ben Recht)

Recent work in differential privacy has shown how to build provably reusable holdout sets and tamper-proof machine learning competitions

http://arxiv.org/abs/1411.2664

http://arxiv.org/abs/1502.04585

These papers look at the problem of preventing a malicious adversary from overfitting to data. But the question remains: if I know that the machine learning competition that I am entering is using one of these mechanisms, what is the optimal strategy for me as a machine learning practitioner to yield the lowest possible error on a test set? Such a model would be guaranteed to generalize, and would have nearly optimal performance subject to generalizability. Concretely: what are the best optimization schemes to use for parameter tuning and model selection, given that one is faced with such a differentially private adversary?