IRCS Technical Reports Series
Search results
Publication Learning First Order Quantifier Denotations An Essay in Semantic Learnability(1996-09-01) Clark, RobinThis paper addresses the problem of how a learner would associate a denotation with a determiner on the basis of the pairing of a simple sentence with some perceptual input. Following work by van Benthem (1985), first order quantification is simulated with (a subclass of) finite state automata. Results from Angluin (1988) are used to demonstrate the learnability of this class.Publication The Question of Root Infinitives in Early Child Greek(1996-06-01) Varlokosta, Spyridoula; Vainikka, Anne; Rohrbacher, BernhardIt is well known that children acquiring Germanic and Romance languages go through an early stage at which they produce declarative sentences with a Root Infinitive (cf. (1)) which would be ungrammatical in the adult language (Stern & Stern 1928, Weverink 1989, Pierce 1992, Wexler 1994). For languages outside of these language families, little work has been done on such Root Infinitives. In the present paper we investigate the status of Root Infinitives in Modern Greek, a language which lacks an infinitive form altogether.Publication Graphical Communicating Shared Resources: A Language for the Specification, Refinement, and Analysis of Real-Time Systems(1996-10-01) Ben-Abdallah, HaneneThe Communicating Shared Resources (CSR) paradigm is an ongoing project at the University of Pennsylvania to build a framework for the development of real-time systems. This project has been motivated by a demand for a rigorous framework in which various design alternatives for a real-time system can be formally specified and rigorously analyzed and tested before implementation. This is an effort to reduce the potentially high cost associated with incorrect operation of real-time systems which are often embedded in safety-critical applications. The work presented in this thesis is a first step towards incorporating software engineering practices into the CSR paradigm. This is achieved, on one hand, by developing a formal, graphical CSR formalism, the Graphical Communicating Shared Resources (GCSR); the GCSR language adopts the intuitive concepts of nodes and edges in state diagrams, an informal specification language that is popular within the software engineering community. In addition, defining a refinement theory for GCSR allows the development of real-time systems within this formalism in a top-down and modular fashion, also a popular design methodology within the software engineering community. The GCSR language adopts a syntax that allows a modular and hierarchical, thus, scalable description of a real-time system. It supports notions of comunication through events, interrupt, concurrency, and time to describe the functional and temporal requirements of a real-time system. In addition, GCSR allows the explicit representation of resources and priorities to resolve resource contention, in such a way that produces easy to understand and modify specifications. The semantics of GCSR is defined operationally either through a direct translation of a GCSR description to a labeled transition system, or indirectly through a sound translation to the Algebra of Communicating Shared Resources (ACSR) [LBGG94] a timed process algebra that also has an operational semantics. The GCSR-ACSR correspondence makes GCSR benefit from process algebraic analysis techniques such as equivalence checking, state space exploration, testing as well as simulation. In addition, the tight correspondence between GCSR and ACSR makes it possible to use the graphical and textual notations interchangeably and to have a sound theory for graphical transformation operations, e.g., to minimize the number of edges and nodes in a GCSR specification without affecting the behavioral description. To support the top-down and modular development of a real-time specification in GCSR, we have augmented ACSR and thus GCSR with a refinement theory. The refinement theory allows relabeling of events, addition of implementation events, and substitution of a time and resource-consuming action with a process that may use fewer or more resources than the refined action. Consistency between an abstract specification and a refined specification is defined in terms of an ordering relation over traces that is extended to sets of traces according to the Hoare ordering or Egli-Milner ordering. The trace ordering relation relates traces that share timing properties such as equal duration and preservation of timed occurrences of communication events of the abstract specification. To facilitate the practical use of the refinement theory, we have characterized the extended trace ordering relations by a set of transformation rules that syntactically derive a refined process from an abstract one. The transformation rules define basic graphical operations that represent GCSR refinements. To experiment with the GCSR language and its refinement theory, we have developed a tool set that allows the specification, refinement, and analysis of real-time systems modeled in GCSR. We report our evaluation in the case of the Production Cell case study [LL95].Publication An Empirical Comparison of Probability Models for Dependency Grammar(1997-05-01) Eisner, Jason MThis technical report is an appendix to Eisner (1996): it gives superior experimental results that were reported only in the talk version of that paper, with details of how the results were obtained. Eisner (1996) trained three probability models on a small set of about 4,000 conjunction-free, dependency grammar parses derived from the Wall Street Journal section of the Penn Treebank, and then evaluated the models on a held-out test set, using a novel O(n3) parsing algorithm. The present paper describes some details of the experiments and repeats them with a larger training set of 25,000 sentences. As reported at the talk, the more extensive training yields greatly improved performance, cutting in half the error rate of Eisner (1996). Nearly half the sentences are parsed with no misattachments; two-thirds of sentences are parsed with at most one misattachment. Of the models described in the original paper, the best score is obtained with the generative "model C", which attaches 87-88% of all words to the correct parent. However, better models are also explored, in particular, two simple variants on the comprehension "model B." The better of these has an attachment accuracy of 93% and (unlike model C) tags words more accurately than the comparable trigram tagger. If tags are roughly known in advance, search error is all but eliminated and the new model attains an attachment accuracy of 93%. We find that the parser of Collins (1996) when combined with a highly, trained tagger, also achieves 93% when trained and tested on the same sentences. We briefly discuss the similarities and differences between Collins's model and ours, pointing out the strengths of each and noting that these strengths could be combined for either dependency parsing or phrase-structure parsing.Publication Complexity and the Induction of Tree Adjoining Grammars(1996-05-01) Clark, RobinIn this paper, I will develop the formal foundations of a theory of complexity that underlies theory of grammatical induction. The initial concern will be the learning theoretic foundations of linguistic locality. That is, I will develop a theory that will place bounds on the amount a learner can draw from an input text. These bounds will limit the amount of variation that could potentially be encoded within a parameter space. A fully developed form of the theory will place a tangible upper limit on what the learner can induce from the input text. The formal theory developed establishes a relationship between the complexity of descriptions and their likelihood; that is, the more complex a structure is, the less likely it is to occur. I will use this result to develop a theory of linguistic complexity. I will rely on this relationship to show that the results developed in the first part of the paper for the parameter setting model also hold for the inductive theory. The final sections of the paper turn to the formal specification of the learning model and a description of the linguistic theory that supports it. This section also describes a pair of heuristic constraints on the learner’s search for viable hypotheses. In general, the learner faces a computationally intractable problem in that there are exponentially many grammatical hypotheses for any input text. These constraints, the Adjunction Constraint and the Substitution Constraint, greatly reduce the number of hypotheses that the learner must consider. Furthermore, metrics on the complexity of the learner’s descriptions guarantee that the hypothesis space can be tractably searched for the adult grammar.Publication Null vs. Overt Subjects in Turkish Discourse: A Centering Analysis(1996-05-01) Turan, Ümit DenizThe purpose of this study is to explore an aspect of discourse coherence which involves anaphoric relations between utterances with special emphasis on subjects in Turkish. Based on an analysis of published narratives, three complementary and interrelated questions are addressed concerning discourse anaphora: 1. Which expressions are available for subsequent definite reference? 2. What factors determine the most salient entity in Turkish among a set of potential antecedents for subsequent definite reference? 3. What are the functions of a particular referential expression (null vs. overt pronouns vs. full NPs), depending on appropriate discourse conditions? An exploration regarding question 1. indicates that, while some NPs evoke discourse entities, other NPs do not. These two types of NPs represent referential and nonreferential expressions and they can function as antecedents for definite and indefinite nonspecific anaphora, respectively. The distinction between null and overt pronouns in Turkish is that only the former can be in an anaphoric relationship with a nonreferential antecedent. Overt pronouns, on the other hand, are sensitive to referent identity, they must have the same referent with their antecedents. In other words, overt pronouns are strictly coreferential, while null pronouns are not constrained in this way. The rest of the study investigates answers to questions 2. and 3. in instances where null and definite subjects alternate as definite anaphors. Centering Theory provides a cognitively plausible and computationally tractable framework for such an analysis with its precise rules which rely on linguistic knowledge constraining inferencing. As formulated in Centering Theory, each utterance contains a set of potential antecedents for reference in the subsequent utterance, i.e. a set of forward-looking centers (Cfs),that are ranked on the basis of their salience. The most salience entity in the Cf-list, the preferred center (Cp) is the entity that is predicted to become the backward-looking center in the subsequent utterance. The singleton backward-looking center (Cb) is taken to be the topic of the current utterance, i.e. the entity at the center of attention. Centering transitions, which model the dynamic attentional state in a discourse segment, are obtained by analyzing each adjacent pair of utterances. The functions of referential expressions in subject position are determined on the basis of Centering transitions. The results show that Turkish subject types pattern neatly and categorically when these transitions are taken into account. A brief discussion of language-specific and universal aspects of discourse anaphora is also included in the study.Publication A Decidable Predicate Logic of Knowledge(1996-05-01) Japaridze, GiorgiThe language we consider is that of classical first order logic augmented with the unary modal operator □. Sentences of this language are regarded as true or false in a knowledge-base KB, which is any finite set of □-free formulas. Truth of □α in KB is understood as that α is true in all classical models of KB and this interpretation is intended to capture the intuition "we know that α" behind □α. The resulting logic is, in general, undecidable and not even semidecidable. However, there is a natural fragment of the above language, called the constructive language, which yields a decidable logic. The only syntactic constraint in the constructive language is that there exists x should always be followed by □. That is, we are not allowed to simply say "there is x such that ..." and we can only say "there is x for which we know that ...". Under this constraint, truth of there existsxα(x) will always imply that an object x for which α(x) holds not only exists, but can be effectively found. This is generally what we want of there exists in practical applications: knowing that "there exists a combination c that opens safe S" has no significance unless such a combination c can actually be found, which, in our semantics, will be equivalent to saying that there is c for which we know that c opens S. So, it is only truth of the sentence there existsc□OPENS(c,S) that really matters, and the latter, unlike there existsc□OPENS(c,S) is a perfectly legal formula of the constructive language. I introduce a decidable sequent system C K N in the constructive language and prove its soundness and completeness with respect to the above semantics.Publication Acquisition of Variable Rules: (-t,d) Deletion and (ing) Production in Preschool Children(1994) Roberts, Julia LeeThere have been many studies over the past few decades documenting the existence of variable rules in adult language. It is only recently, however, that the acquisition of these rules has been the focus of research, and that event has opened the door for questions about the interaction of the learning of categorical rules and that of variable rules. Specifically, questions have arisen as to whether these rules might not be construed as either a performance factor and/or a reflection of universal constraints on language. The present study examines the acquisition of (-t,d) deletion and (ing) production in 3- and 4-year-old children in order to ascertain their degree of mastery of phonological, grammatical, and social constraints. Seventeen children were tape recorded during play interview sessions in their South Philadelphia day care center. Six to thirteen sessions per child over a three month period were required to obtain sufficient data for analysis. In addition, eight of their parents were interviewed in their homes for purposes of comparison. Results of the study revealed that children as young as three had, for the most part, mastered the process of variation of (ing) and the phonological constraints on (-t,d) deletion, and they were well into the process of acquiring the grammatical constraints on (-t,d) deletion. Their learning of a dialect specific phonological constraint demonstrated that their mastery of this variable rule was not a reflection of universal constraints. Further, their independent analysis of semi-weak verbs made it clear that they were not simply copying frequencies of their parents' forms but learning an abstract rule. The children's acquisition of the extralinguistic constraints on these rules lagged behind that of the linguistic factors. Of particular interest to the issue of gender differences in language was the girls' surprising tendency to delete (-t,d) more often than the boys, demonstrating that they had not yet learned linguistic conservatism in instances of stable variation and arguing against a biological basis for sex-based sociolinguistic differences.Publication Defining Cognitive Science at IRCS: Proceedings from the April 1995 Postdoc Workshop(1995-12-01) Vainikka, AnneThe nature of language is the enigma linguists try to solve. What makes their task difficult is that in a number of cases, the logical links between the different components of a linguistic unit are difficult to establish. For example, in tone languages, the basic units that are the words are composed of two kinds of minimal, significant sound units, the phonemes on the segmental tier and the tonemes on a distinct tier. The word consists then of these two types of linguistic elements, related by phonological principles. In Mawukakan, a Manding language spoken in Western Ivory Coast, the word resembles an architectural construction where the function of some structures only becomes evident when you consider the building with its surroundings. This is illustrated below by the tonal assignment of toneless clitics and the lengthening of the vowel of the focus marker le in Mawu.Publication Perception as Unconscious Interference*(2001-04-01) Hatfield, GarySince antiquity, visual theorists have variously proposed that perception (usually vision) results from unconscious inference. This paper reviews historical and recent theories of unconscious inferences, in order to make explicit their commitments to inferential cognitive processes. In particular, it asks whether the comparison of perception with inference has been intended metaphorically or literally. It then focuses on the literal theories, and assesses their resources for responding to three problems that arise when visual perception is explained as resulting from unconscious inference: the cognitive machinery problem, the sophisticated content problem, and the phenomenal problem.