Statistics Papers

Document Type

Technical Report

Date of this Version


Publication Source

Toyota Technical Institute-Chicago


We analyze the multi-view regression problem where we have two views X = (X(1),X(2)) of the input data and a target variable Y of interest. We provide sufficient conditions under which we can reduce the dimensionality of X (via a projection) without loosing predictive power of Y . Crucially, this projection can be computed via a Canonical Correlation Analysis only on the unlabeled data. The algorithmic template is as follows: with unlabeled data, perform CCA and construct a certain projection; with the labeled data, do least squares regression in this lower dimensional space. We show how, under certain natural assumptions, the number of labeled samples could be significantly reduced (in comparison to the single view setting) — in particular, we show how this dimensionality reduction does not loose predictive power of Y (thus it only introduces little bias but could drastically reduce the variance). We explore two separate assumptions under which this is possible and show how, under either assumption alone, dimensionality reduction could reduce the labeled sample complexity. The two assumptions we consider are a conditional independence assumption and a redundancy assumption. The typical conditional independence assumption is that conditioned on Y the views X(1) and X(2) are independent — we relax this assumption to be conditioned on some hidden state H the views X(1) and X(2) are independent. Under the redundancy assumption, we have that the best predictor from each view is roughly as good as the best predictor using both views.


Toyota Technical Institute-Chicago, Technical Report TTI-TR-2008-4

At the time of publication, author Sham M. Kakade was affiliated with Toyota Technological Institute at Chicago. Currently (August 2016), he is a faculty member at the Statistics Department at the University of Pennsylvania.



Date Posted: 27 November 2017