Differentially Private Model Selection (Ben Recht)

Recent work in differential privacy has shown how to build provably reusable holdout sets and tamper-proof machine learning competitions



These papers look at the problem of preventing a malicious adversary from overfitting to data.  But the question remains: if I know that the machine learning competition that I am entering is using one of these mechanisms, what is the optimal strategy for me as a machine learning practitioner to yield the lowest possible error on a test set?  Such a model would be guaranteed to generalize, and would have nearly optimal performance subject to generalizability.  Concretely: what are the best optimization schemes to use for parameter tuning and model selection, given that one is faced with such a differentially private adversary?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s