IRCS Technical Reports Series
Now showing 1 - 10 of 199
PublicationUsing Syntactic Information in Document Filtering: A Comparative Study of Part-of-Speech Tagging and Supertagging(1996-12-01) Chandrasekar, R.; Srinivas, B.Any coherent text contains significant latent information, such as syntactic structure and patterns of language use. This information can be exploited to overcome the inadequacies of keyword-based retrieval and make information retrieval more efficient. In this paper, we demonstrate quantitatively how syntactic information is useful in filtering out irrelevant documents. We also compare two different syntactic labelings-- simple Part-of-Speech (POS) labeling and Supertag labeling-- and show how the richer (more fine-grained) representation of supertags leads to more efficient and effective document filtering. We have implemented a system which exploits syntactic information in a flexible manner to filter documents. The system has been tested on a large collection of news sentences, and achieves an F-score of 89 for filtering out irrelevant sentences. Its performance and modularity makes it a promising postprocessing addition to any Information Retrieval system. PublicationOn Relational Completeness of Multi-Modal Categories Logics(1998-09-01) Jäger, Gerhard; Jäger, GerhardSeveral recent results show that the Lambek Calculus L and its close relative L1 is sound and complete under (possibly relativized) relational interpretation. The paper transfers these results to L◊, the multi-modal extension of the Lambek Calculus that was proposed in Moortgat 1996. Two natural relational interpretations of L◊ are proposed and shown to be sound and complete. The completeness proofs make heavy use of the method of relational labeling from Kurtonina 1995. Finally, it is demonstrated that relational interpretation provides a semantic justification for the translation from L◊ to L from Versmissen 1996. PublicationModal Logic Over Finite Structures(1995-10-01) Rosen, EricIn this paper, we develop various aspects of the finite model theory of propositional modal logic. In particular, we show that certain results about the expressive power of modal logic over the class of all structures, due to van Benthem and his collaborators, remain true over the class of finite structures. We establish that a first-order definable class of finite models is closed under bisimulations if it is definable by a `modal first-order sentence’. We show that a class of finite models that is defined by a modal sentence is closed under extensions if it is defined by a diamond-modal sentence. In sharp contrast, it is well known that many classical results for first-order logic, including various preservation theorems, fail for the class of finite models. PublicationThe Anaphoric Parallel between Modality and Tense(1997-05-01) Stone, MatthewIn modal subordination, a modal sentence is interpreted relative to a hypothetical scenario introduced in an earlier sentence. In this paper, I argue that this phenomenon reflects the fact that the interpretation of modals is an ANAPHORIC process, precisely analogous to the anaphoric interpretation of tense. Modal morphemes introduce alternative scenarios as entities into the discourse model; their interpretation depends on evoking scenarios for described, reference and speech points, and relating them to one another. Although this account formalizes anaphoric connections using dynamic semantics, it invokes a novel and direct encoding of scenarios as ordinary, static objects (competing analyses take modal referents to be inherently dynamic objects, unlike the referents of pronouns and tenses). The result is a simpler proposal with better empirical coverage. PublicationSome Novel Applications of Explanation-Based Learning to Parsing Lexicalized Tree-Adjoining Grammars(1995-05-01) Joshi, Aravind K; Joshi, Aravind KIn this paper we present some novel applications of Explanation-Based Learning (EBL) technique to parsing Lexicalized Tree-Adjoining grammars. The novel aspects are (a) immediate generalization of parses in the training set, (b) generalization over recursive structures and (c) representation of generalized parses as Finite State Transducers. A highly impoverished parser called a “stapler” has also been introduced. We present experimental results using EBL for different corpora and architectures to show the effectiveness of our approach. PublicationProbabilistic Matching of Brain Images(1995-04-01) Gee, James C; LeBriquer, L.; Barillot, C.; Haynor, D. R.Image matching has emerged as an important area of investigation in medical image analysis. In particular, much attention has been focused on the atlas problem, in which a template representing the structural anatomy of the human brain is deformed to match anatomic brain images from a given individual. The problem is made difficult because there are important differences in both the gross and local morphology of the brain among normal individuals. We have formulated the image matching problem under a Bayesian framework. The Bayesian methodology facilitates a principled approach to the development of a matching model. Of special interest is its capacity to deal with uncertainty in the estimates, a potentially important but generally ignored aspect of the solution. In the construction of a reference system for the human brain, the Bayesian approach is well suited to the task of modeling variation in morphology. Statistical information about morphological variability, accumulated over past samples, can be formally introduced into the problem formulation to guide the matching or normalization of future data sets. PublicationWhat Does a Grammar Formalism Say About a Language?(1996-05-14) Rogers, James; Rogers, JamesOver the last ten or fifteen years there has been a shift in generative linguistics away from formalisms based on a procedural interpretation of grammars towards constraint-based formalisms—formalisms that define languages by specifying a set of constraints that characterize the set of well-formed structures analyzing the strings in the language. A natural extension of this trend is to define this set of structures model-theoretically—to define it as the set of mathematical structures that satisfy some set of logical axioms. This approach raises a number of questions about the nature of linguistic theories and the role of grammar formalisms in expressing them. We argue here that the crux of what theories of syntax have to say about language lies in the abstract properties of the sets of structures they license. This is the level that is most directly connected to the empirical basis of these theories and it is the level at which it is possible to make meaningful comparisons between the approaches. From this point of view, grammar formalisms, or (formal frameworks) are primarily means of presenting these properties. Many of the apparent distinctions between formalisms, then, may well be artifacts of their presentation rather than substantive distinctions between the properties of the structures they license. The model-theoretic approach offers a way in which to abstract away from the idiosyncrasies of these presentations. Having said that, we must distinguish between the class of sets of structures licensed by a linguistic theory and the set of structures licensed by a specific instance of the theory—by a grammar expressing that theory. Theories of syntax are not simply accounts of the structure of individual languages in isolation, but rather include assertions about the organization of the structure of human languages in general. These universal aspects of the theories present two challenges for the model-theoretic approach. First, they frequently are not properties of individual structures, but are rather properties of sets of structures. Thus, in capturing these model-theoretically one is not defining sets of structures but is rather defining classes of sets of structures; these are not first order properties. Secondly, the universal aspects of linguistic theories are frequently not explicit, but are consequences of the nature of the formalism that embodies the theory. In capturing these one must develop an explicit axiomatic treatment of the formalism. This is both a challenge and a powerful beneft of the approach. Such re-interpretations tend to raise a variety of issues that are often overlooked in the original formalization. In this report we examine these issues within the context of a model-theoretic reinterpretation of Generalized Phrase-Structure Grammar. While there is little current active research on GPSG, it provides an ideal laboratory for exploring these issues. First, the formalism of GPSG is expressly intended to embody a great deal of the accompanying linguistic theory. Thus it provides a variety of opportunities for examining principles expressed as restrictions on the formalism from a model-theoretic point of view. At the same time, the fact that these restrictions embody universal grammar principles provides us with a variety of opportunities to explore the way in which the linguistic theory expressed by a grammar can transcend the mathematical theory of the structures it licenses. Finally, GPSG, although defined declaratively, is a formalism with restricted generative capacity, a characteristic more typical of the earlier procedural formalisms. As such, one component of the theory it embodies is a claim about the language-theoretic complexity of natural languages. Such claims are difficult to establish for any of the constraint-based approaches to grammar. We can show, however, that the class of sets of trees that are definable within the logical language we employ in reformalizing GPSG is nearly exactly the class of sets of trees definable within the basic GPSG formalism. Thus we are able to capture the language-theoretic consequences of GPSGs restricted formalism by employing a restricted logical language. PublicationBayesian Approach to the Brain Image Matching Problem(1995-04-01) Gee, James C; LeBriquer, L.; Barillot, C.; Haynor, D. R.; Bajcsy, RuzenaThe application of image matching to the problem of localizing structural anatomy in images of the human brain forms the specic aim of our work. The interpretation of such images is a difficult task for human observers because of the many ways in which the identity of a given structure can be obscured. Our approach is based on the assumption that a common topology underlies the anatomy of normal individuals. To the degree that this assumption holds, the localization problem can be solved by determining the mapping from the anatomy of a given individual to some referential atlas of cerebral anatomy. Previous such approaches have in many cases relied on a physical interpretation of this mapping. In this paper, we examine a more general Bayesian formulation of the image matching problem and demonstrate the approach on two dimensional magnetic resonance images. PublicationBracketing Guidelines for Penn Korean TreeBank(2001-05-01) Han, Na-Rae; Ko, Eon-Suk; Han, Na-Rae; Ko, Eon-SukThis document describes the syntactic bracketing guidelines for the Penn Korean Treebank, which is an online corpus of Korean texts annotated with morphological and syntactic information. The corpus consists of around 54,000 words and 5,000 sentences. The Treebank uses a phrase structure style of annotation, making head/phrasal node distinctions, argument/adjunct distinctions, and identifying empty arguments and traces for moved constituents. This document is organized as follows. In section 2, the basic syntactic ingredients of a clause structure are presented. Some notational conventions are introduced in section 3, including different types of syntactic tags, such as head level tags, phrase level tags and function tags used in the Treebank. In section 4, the bracketing guidelines for various types of clauses are discussed, including simple clauses, subordinate clauses, and clauses with coordination. Several types of subcategorizaion frames found in the Treebank are then presented in section 5, followed by bracketing guidelines for various linguistic phenomena in sections 6 to 21, including guidelines for annotating punctuation. The document ends with guidelines for handling some bracketing ambiguities and for handling some confusing examples. PublicationHierarchical regression modeling for language research(2009-11-01) Gorman, Kyle; Gorman, KyleI demonstrate the application of hierarchical regression modeling, a state-of-the-art technique for statistical inference, to language research. First, a stable sociolinguistic variable in Philadelphia (Labov, 2001) is reconsidered, with attention paid to the treatment of collinearities among socioeconomic predictors. I then demonstrate the use of hierarchical models to account for the random sampling of subjects and items in an experimental setting, using data from a study of word-learning in the face of tonal variation (Quam and Swingley, forthcoming). The results from these case studies demonstrate that modeling sampling from the population has empirical consequences.