Parameter Tuning (Ben Recht)

What is the “right” way to think about parameter tuning in machine learning?  Contemporary machine learning models may have thousands of parameters that have to be hand tuned by the engineer.  Current techniques for automating this tuning have two flavors:

(a) use derivative free optimization to set the parameters.  This reduces the problem to black box optimization and is necessarily exponential time in the number of parameters.  Current “best-practice” using Gaussian Processes is often not much better than pure random search.

(b) combine multiple models using statistical aggregation techniques.  This reduces the problem to model selection.  The resulting models are often large, unwieldly and uninterpretable (like random forests).

Both methods ignore the structure of machine learning design, where pipelines are built stage-wise in a DAG, and ignore concerns about stability of the end-to-end pipeline. Are we just thinking about this problem incorrectly? What other structures and modeling can we take advantage of when optimizing machine learning models end-to-end

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s