In machine learning theory, we study problems with very simple descriptions. We analyze how to fit a mixture of gaussians, or train an SVM, or compute principal components. But in the wild, machine learning is a much more involved process. A data scientist (heheh) is never just presented with data and told to fit an SVM. One must first decide how to represent the data as a vector or discrete code, then how to normalize features, then how to prune features, and then, maybe, can begin fitting an SVM.

What can we say about the theoretical properties of such “end-to-end” machine learning? What do the risk bounds look like? Is this simply a matter of studying the “stability” properties of the end-to-end pipeline. If so, how does one evaluate the stability of connecting together a series of components which have their individual, well-understood, stability properties. Can we derive generalization bounds for general DAGs of heterogenous machine learning components?