Date of this Version
Toyota Technical Institute-Chicago
We analyze the multi-view regression problem where we have two views X = (X(1),X(2)) of the input data and a target variable Y of interest. We provide sufficient conditions under which we can reduce the dimensionality of X (via a projection) without loosing predictive power of Y . Crucially, this projection can be computed via a Canonical Correlation Analysis only on the unlabeled data. The algorithmic template is as follows: with unlabeled data, perform CCA and construct a certain projection; with the labeled data, do least squares regression in this lower dimensional space. We show how, under certain natural assumptions, the number of labeled samples could be significantly reduced (in comparison to the single view setting) — in particular, we show how this dimensionality reduction does not loose predictive power of Y (thus it only introduces little bias but could drastically reduce the variance). We explore two separate assumptions under which this is possible and show how, under either assumption alone, dimensionality reduction could reduce the labeled sample complexity. The two assumptions we consider are a conditional independence assumption and a redundancy assumption. The typical conditional independence assumption is that conditioned on Y the views X(1) and X(2) are independent — we relax this assumption to be conditioned on some hidden state H the views X(1) and X(2) are independent. Under the redundancy assumption, we have that the best predictor from each view is roughly as good as the best predictor using both views.
Foster, D. P., Kakade, S. M., & Zhang, T. (2008). Multi-View Dimensionality Reduction via Canonical Correlation Analysis. Toyota Technical Institute-Chicago, Retrieved from https://repository.upenn.edu/statistics_papers/150
Date Posted: 27 November 2017