An active learner is given a model class, a large number of unlabeled samples, and the ability to interactively query labels of a subset of these samples. The goal of the learner is to provide a model in the model class that fits the data best by making as few interactive label queries as possible. An example is active regression; here, the model class is the set of linear models , and the corresponding labels are generated as follows:

When there is no model mismatch — that is, when the labels are indeed generated from a model in the input model class — the active learning problem is relatively easy, and has been addressed in a fairly general context in this paper:

http://arxiv.org/abs/1506.02348

What happens when the labels are not necessarily generated by a model in the input model class? This paper gives an approach for linear regression:

http://arxiv.org/abs/1410.5920

by using a piecewise partitioning over the data domain, and learning a series of importance weights that are constant over each piece of the partition. The importance weights are adjusted so that weighted MLE estimates have optimal (upper bound on) risk.

Can we find a better and more efficient way for active learning for MLE with model mismatch?