Saul, Lawrence K
Email Address
ORCID
Disciplines
Search Results
Now showing 1 - 10 of 17
Publication Visualization of Low Dimensional Structure in Tonal Pitch Space(2005-09-06) Burgoyne, J. Ashley; Saul, Lawrence KIn his 2001 monograph Tonal Pitch Space, Fred Lerdahl defined a distance function over tonal and post-tonal harmonies distilled from years of research on music cognition. Although this work references the toroidal structure commonly associated with harmonic space, it stops short of presenting an explicit embedding of this torus. It is possible to use statistical techniques to recreate such an embedding from the distance function, yielding a more complex structure than the standard toroidal model has heretofore assumed. Nonlinear techniques can reduce the dimensionality of this structure and be tuned to emphasize global or local anatomy. The resulting manifolds highlight the relationships inherent in the tonal system and offer a basis for future work in machine-assisted analysis and music theory.Publication Learning a kernel matrix for nonlinear dimensionality reduction(2004-07-04) Weinberger, Kilian Q; Sha, Fei; Saul, Lawrence KWe investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. Noting that the kernel matrix implicitly maps the data into a nonlinear feature space, we show how to discover a mapping that unfolds the underlying manifold from which the data was sampled. The kernel matrix is constructed by maximizing the variance in feature space subject to local constraints that preserve the angles and distances between nearest neighbors. The main optimization involves an instance of semidefinite programming---a fundamentally different computation than previous algorithms for manifold learning, such as Isomap and locally linear embedding. The optimized kernels perform better than polynomial and Gaussian kernels for problems in manifold learning, but worse for problems in large margin classification. We explain these results in terms of the geometric properties of different kernels and comment on various interpretations of other manifold learning algorithms as kernel methods.Publication Hierarchical Distributed Representations for Statistical Language Modeling(2004-12-13) Blitzer, John; Saul, Lawrence K; Weinberger, Kilian Q; Pereira, Fernando C.N.Statistical language models estimate the probability of a word occurring in a given context. The most common language models rely on a discrete enumeration of predictive contexts (e.g., n-grams) and consequently fail to capture and exploit statistical regularities across these contexts. In this paper, we show how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model. The representations are initialized by unsupervised algorithms for linear and nonlinear dimensionality reduction [14], then fed as input into a hierarchical mixture of experts, where each expert is a multinomial distribution over predicted words [12]. While the distributed representations in our model are inspired by the neural probabilistic language model of Bengio et al. [2, 3], our particular architecture enables us to work with significantly larger vocabularies and training corpora. For example, on a large-scale bigram modeling task involving a sixty thousand word vocabulary and a training corpus of three million sentences, we demonstrate consistent improvement over class-based bigram models [10, 13]. We also discuss extensions of our approach to longer multiword contexts.Publication Multiplicative Updates for Large Margin Classifiers(2003-08-24) Saul, Lawrence K; Sha, Fei; Lee, Daniel DVarious problems in nonnegative quadratic programming arise in the training of large margin classifiers. We derive multiplicative updates for these problems that converge monotonically to the desired solutions for hard and soft margin classifiers. The updates differ strikingly in form from other multiplicative updates used in machine learning. In this paper, we provide complete proofs of convergence for these updates and extend previous work to incorporate sum and box constraints in addition to nonnegativity.Publication Think Globally, Fit Locally : Unsupervised Learning of Low Dimensional Manifolds(2003-06-01) Saul, Lawrence K; Roweis, Sam TThe problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data. The data, assumed to be sampled from an underlying manifold, are mapped into a single global coordinate system of lower dimensionality. The mapping is derived from the symmetries of locally linear reconstructions, and the actual computation of the embedding reduces to a sparse eigenvalue problem. Notably, the optimizations in LLE--though capable of generating highly nonlinear embeddings--are simple to implement, and they do not involve local minima. In this paper, we describe the implementation of the algorithm in detail and discuss several extensions that enhance its performance. We present results of the algorithm applied to data sampled from known manifolds, as well as to collections of images of faces, lips, and handwritten digits. These examples are used to provide extensive illustrations of the algorithm’s performance--both successes and failures--and to relate the algorithm to previous and ongoing work in nonlinear dimensionality reduction.Publication Exploratory analysis and visualization of speech and music by locally linear embedding(2004-05-17) Jain, Viren; Saul, Lawrence KMany problems in voice recognition and audio processing involve feature extraction from raw waveforms. The goal of feature extraction is to reduce the dimensionality of the audio signal while preserving the informative signatures that, for example, distinguish different phonemes in speech or identify particular instruments in music. If the acoustic variability of a data set is described by a small number of continuous features, then we can imagine the data as lying on a low dimensional manifold in the high dimensional space of all possible waveforms. Locally linear embedding (LLE) is an unsupervised learning algorithm for feature extraction in this setting. In this paper, we present results from the exploratory analysis and visualization of speech and music by LLE.Publication Unsupervised learning of image manifolds by semidefinite programming(2004-06-27) Weinberger, Kilian Q; Saul, Lawrence KCan we detect low dimensional structure in high dimensional data sets of images and video? The problem of dimensionality reduction arises often in computer vision and pattern recognition. In this paper, we propose a new solution to this problem based on semidefinite programming. Our algorithm can be used to analyze high dimensional data that lies on or near a low dimensional manifold. It overcomes certain limitations of previous work in manifold learning, such as Isomap and locally linear embedding. We illustrate the algorithm on easily visualized examples of curves and surfaces, as well as on actual images of faces, handwritten digits, and solid objects.Publication Nonnegative deconvolution for time of arrival estimation(2004-05-17) Lin, Yuanqing; Lee, Daniel D; Saul, Lawrence KThe interaural time difference (ITD) of arrival is a primary cue for acoustic sound source localization. Traditional estimation techniques for ITD based upon cross-correlation are related to maximum-likelihood estimation of a simple generative model. We generalize the time difference estimation into a deconvolution problem with nonnegativity constraints. The resulting nonnegative least squares optimization can be efficiently solved using a novel iterative algorithm with guaranteed global convergence properties. We illustrate the utility of this algorithm using simulations and experimental results from a robot platform.Publication Global Coordination of Local Linear Models(2001-12-03) Roweis, Sam; Saul, Lawrence K; Hinton, Geoffrey EHigh dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifold—arguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the “global coordination” of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model’s parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifold—even when the path crosses the domains of many different local models. The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones.Publication Multiplicative Updates for Classification by Mixture Models(2001-12-03) Saul, Lawrence K; Lee, Daniel DWe investigate a learning algorithm for the classification of nonnegative data by mixture models. Multiplicative update rules are derived that directly optimize the performance of these models as classifiers. The update rules have a simple closed form and an intuitive appeal. Our algorithm retains the main virtues of the Expectation-Maximization (EM) algorithm—its guarantee of monotonic improvement, and its absence of tuning parameters—with the added advantage of optimizing a discriminative objective function. The algorithm reduces as a special case to the method of generalized iterative scaling for log-linear models. The learning rate of the algorithm is controlled by the sparseness of the training data. We use the method of nonnegative matrix factorization (NMF) to discover sparse distributed representations of the data. This form of feature selection greatly accelerates learning and makes the algorithm practical on large problems. Experiments show that discriminatively trained mixture models lead to much better classification than comparably sized models trained by EM.