Formal and Computational Aspects of Natural Language Syntax
This thesis explores issues related to using a restricted mathematical formalism as the formal basis for the representation of syntactic competence and the modeling of performance. The specific contribution of this thesis is to examine a language with considerably freer word-order than English, namely German, and to investigate the formal requirements that this syntactic freedom imposes. Free word order (or free constituent order) languages can be seen as a test case for linguistic theories, since presumably the stricter word order can be subsumed by an apparatus that accounts for freer word order. The formal systems investigated in this thesis are based on the tree adjoining grammar (TAG) formalism of Joshi et al. (1975). TAG is an appealing formalism for the representation of natural language syntax because its elementary structures are phrase structure trees, which allows the linguist to localize linguistic dependencies such as agreement, subcategorization, and filler-gap relations, and to develop a theory of grammar based on the lexicon. The main results of the thesis are an argument that simple TAGs are formally inadequate, and the definition of an extension to TAG that is. Every aspect of the definition of this extension to TAG, called V-TAG, is specifically motivated by linguistic facts, not by formal considerations. A formal investigation of V-TAG reveals that (when lexicalized) it has restricted generative capacity, that it is polynomial parsable, and that it forms an abstract family of languages. This means that it has desirable formal properties for representing natural language syntax. Both a formal automaton and a parser for V-TAG are presented. As a consequence of the new system, a reformulation of the linguistic theory that has been proposed for TAG suggests itself. Instead of including a transformational step in the theory of grammar, all derivations are performed within mathematically defined formalisms, thus limiting the degrees of freedom in the linguistic theory, and making the theory more appealing from a computational point of view. This has several interesting linguistic consequences; for instance, functional categories are expressed by feature content (not node labels), and head movement is replaced by the adjunction of heads. The thesis sketches a fragment of a grammar of German, which covers phenomena such as scrambling, extraposition, topicalization, and the V2 effect. Finally, the formal automaton for V-TAG is used as a model of human syntactic processing. It is shown that this model makes several interesting predictions related to free word order in German.