The trade-off between privacy preservation and statistical inference takes
interesting forms in the context of recommendation engines. In
http://arxiv.org/abs/1207.3269 we considered the problem of inferring
clusters of similar items from inputs of users under an extreme form of
differential privacy, namely local differential privacy. Local DP mandates
each user to only provide input that is distorted by noise so that the
underlying ratings of items by that user (as well as which items it rated)
remain ambiguous. We assessed performance of clustering schemes under
these constraints by query complexity. This counts how many users are
probed before a cluster structure underlying users’s ratings can be
reliably identified. Assuming users rate a small number of items chosen
uniformly at random, the best query complexity we could achieve is an
impracticably large where is the number of items.
However the best lower bounds we know are in .
This begs the question of determining whether in this setup quasi-linear
query complexity suffices to successfully cluster, and if so, what scheme
achieves such complexity?