Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Computer and Information Science

First Advisor

Lyle H. Ungar

Second Advisor

James C. Gee


Spectral learning algorithms are becoming increasingly popular in data-rich domains, driven in part by recent advances in large scale randomized SVD, and in spectral estimation of Hidden Markov Models. Extensions of these methods lead to statistical estimation algorithms which are not only fast, scalable, and useful on real data sets, but are also provably correct.

Following this line of research, we make two contributions. First, we

propose a set of spectral algorithms for text analysis and natural

language processing. In particular, we propose fast and scalable

spectral algorithms for learning word embeddings -- low dimensional

real vectors (called Eigenwords) that capture the “meaning” of words from their context. Second, we show how similar spectral methods can be applied to analyzing brain images.

State-of-the-art approaches to learning word embeddings are slow to

train or lack theoretical grounding; We propose three spectral

algorithms that overcome these limitations. All three algorithms

harness the multi-view nature of text data i.e. the left and right

context of each word, and share three characteristics:

1). They are fast to train and are scalable.

2). They have strong theoretical properties.

3). They can induce context-specific embeddings i.e. different embedding for “river bank” or “Bank of America”.


They also have lower sample complexity and hence higher statistical

power for rare words. We provide theory which establishes

relationships between these algorithms and optimality criteria for the

estimates they provide. We also perform thorough qualitative and

quantitative evaluation of Eigenwords and demonstrate their superior performance over state-of-the-art approaches.

Next, we turn to the task of using spectral learning methods for brain imaging data.

Methods like Sparse Principal Component Analysis (SPCA), Non-negative Matrix Factorization (NMF) and Independent Component Analysis (ICA) have been used to obtain state-of-the-art accuracies in a variety of problems in machine learning. However, their usage in brain imaging, though increasing, is limited by the fact that they are used as out-of-the-box techniques and are seldom tailored to the domain specific constraints and knowledge pertaining to medical imaging, which leads to difficulties in interpretation of results.

In order to address the above shortcomings, we propose

Eigenanatomy (EANAT), a general framework for sparse matrix factorization. Its goal is to statistically learn the boundaries of

and connections between brain regions by weighing both the data and prior neuroanatomical knowledge.

Although EANAT incorporates some neuroanatomical prior knowledge in the form of connectedness and smoothness constraints, it can still be difficult for clinicians to interpret the results in specific domains where network-specific hypotheses exist. We thus extend EANAT and present a novel framework for prior-constrained sparse decomposition of matrices derived from brain imaging data, called Prior Based Eigenanatomy (p-Eigen). We formulate our solution in terms of a prior-constrained l1 penalized (sparse) principal component analysis. Experimental evaluation confirms that p-Eigen extracts biologically-relevant, patient-specific functional parcels and that it significantly aids classification of Mild Cognitive Impairment when compared to state-of-the-art competing approaches.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."