Pierce, Benjamin C

Email Address
Research Projects
Organizational Units
Research Interests

Search Results

Now showing 1 - 10 of 40
  • Publication
    Quotient Lenses
    (2009-02-10) Foster, J. Nathan; Pilkiewicz, Alexandre; Pierce, Benjamin C
    There are now a number of bidirectional programming languages, where every program can be read both as a forward transformation mapping one data structure to another and as a reverse transformation mapping an edited output back to a correspondingly edited input. Besides parsimony—the two related transformations are described by just one expression— such languages are attractive because they promise strong behavioral laws about how the two transformations fit together—e.g., their composition is the identity function. It has repeatedly been observed, however, that such laws are actually a bit too strong: in practice, we do not want them “on the nose,” but only up to some equivalence, allowing inessential details, such as whitespace, to be modified after a round trip. Some bidirectional languages loosen their laws in this way, but only for specific, baked-in equivalences. In this work, we propose a general theory of quotient lenses—bidirectional transformations that are well behaved modulo equivalence relations controlled by the programmer. Semantically, quotient lenses are a natural refinement of lenses, which we have studied in previous work. At the level of syntax, we present a rich set of constructs for programming with canonizers and for quotienting lenses by canonizers. We track equivalences explicitly, with the type of every quotient lens specifying the equivalences it respects. We have implemented quotient lenses as a refinement of the bidirectional string processing language Boomerang. We present a number of useful primitive canonizers for strings, and give a simple extension of Boomerang’s regular-expression-based type system to statically typecheck quotient lenses. The resulting language is an expressive tool for transforming real-world, ad-hoc data formats. We demonstrate the power of our notation by developing an extended example based on the UniProt genome database format and illustrate the generality of our approach by showing how uses of quotienting in other bidirectional languages can be translated into our notation.
  • Publication
    How Good Is Local Type Inference?
    (1999-06-22) Hosoya, Haruo; Pierce, Benjamin C
    A partial type inference technique should come with a simple and precise specification, so that users predict its behavior and understand the error messages it produces. Local type inference techniques attain this simplicity by inferring missing type information only from the types of adjacent syntax nodes, without using global mechanisms such as unification variables. The paper reports on our experience with programming in a full-featured programming language including higher-order polymorphism, subtyping, parametric datatypes, and local type inference. On the positive side, our experiments on several nontrivial examples confirm previous hopes for the practicality of the type inference method. On the negative side, some proposed extensions mitigating known expressiveness problems turn out to be unsatisfactory on close examination.
  • Publication
    Union Types for Semistructured Data
    (1999-04-06) Buneman, Peter; Pierce, Benjamin C
    Semistructured databases are treated as dynamically typed: they come equipped with no independent schema or type system to constrain the data. Query languages that are designed for semistructured data, even when used with structured data, typically ignore any type information that may be present. The consequences of this are what one would expect from using a dynamic type system with complex data: fewer guarantees on the correctness of applications. For example, a query that would cause a type error in a statically typed query language will return the empty set when applied to a semistructured representation of the same data. Much semistructured data originates in structured data. A semistructured representation is useful when one wants to add data that does not conform to the original type or when one wants to combine sources of different types. However, the deviations from the prescribed types are often minor, and we believe that a better strategy than throwing away all type information is to preserve as much of it as possible. We describe a system of untagged union types that can accommodate variations in structure while still allowing a degree of static type checking. A novelty of this system is that it involves non-trivial equivalences among types, arising from a law of distributivity for records and unions: a value may be introduced with one type (e.g., a record containing a union) and used at another type (a union of records). We describe programming and query language constructs for dealing with such types, prove the soundness of the type system, and develop algorithms for subtyping and typechecking.
  • Publication
    Statically Typed Document Transformation: An XTATIC Experience
    (2005-10-14) Gapeyev, Vladimir; Garillot, François; Pierce, Benjamin C
    XTATIC is a lightweight extension of C⋕ with native support for statically typed XML processing. It features XML trees as built-in values, a refined type system based on regular types à la XDUCE, and regular patterns for investigating and manipulating XML. We describe our experiences using XTATIC in a real-world application: a program for transforming XMLSPEC, a format used for authoring W3C technical reports, into HTML. Our implementation closely follows an existing one written in XSLT, facilitating comparison of the two languages and analysis of the costs and benets—both signicant—of rich static typing for XML-intensive code.
  • Publication
    Boomerang: Resourceful Lenses for String Data
    (2007-11-19) Bohannon, Aaron; Foster, J. Nathan; Pierce, Benjamin C; Pilkiewicz, Alexandre; Schmitt, Alan
    A lens is a bidirectional program. When read from left to right, it denotes an ordinary function that maps inputs to outputs. When read from right to left, it denotes an "update translator" that takes an input together with an updated output and produces a new input that reflects the update. Many variants of this idea have been explored in the literature, but none deal fully with ordered data. If, for example, an update changes the order of a list in the output, the items in the output list and the chunks of the input that generated them can be misaligned, leading to lost or corrupted data. We attack this problem in the context of bidirectional transformations over strings, the primordial ordered data type. We first propose a collection of bidirectional string lens combinators, based on familiar operations on regular transducers (union, concatenation, Kleene-star) and with a type system based on regular expdressions. We then design a new semantic space of dictionary lenses, enriching the lenses of Foster et al. (2007b) with support for two additional combinators for marking "reorderable chunks" and their keys. To demonstrate the effectiveness of these primitives, we describe the design and implementation of Boomerang, a full-blown bidirectional programming language with dictionary lenses at its core. We have used Boomerang to build transformers for complex real-world data formats including the SwissProt genomic database. We formalize the essential property of resourcefulness - the correct use of keys to associate chunks in the input and output - by defining a refined semantic space of quasi-oblivious lenses. Several previously studied properties of lenses turn out to have compact characterizations in this space.
  • Publication
    A Bisimulation for Type Abstraction and Recursion
    (2005-01-12) Sumii, Eijiro; Pierce, Benjamin C
    We present a sound, complete, and elementary proof method, based on bisimulation, for contextual equivalence in a λ-calculus with full universal, existential, and recursive types. Unlike logical relations (either semantic or syntactic), our development is elementary, using only sets and relations and avoiding advanced machinery such as domain theory, admissibility, and TT-closure. Unlike other bisimulations, ours is complete even for existential types. The key idea is to consider sets of relations—instead of just relations—as bisimulations.
  • Publication
    TinkerType: A Language for Playing With Formal Systems
    (2000-10-23) Levin, Michael Y; Pierce, Benjamin C
    TinkerType is a pragmatic framework for compact and modular description of formal systems (type systems, operational semantics, logics, etc.). A family of related systems is broken down into a set of clauses — individual inference rules — and a set of features controlling the inclusion of clauses in particular systems. Simple static checks are used to help maintain consistency of the generated systems. We present TinkerType and its implementation, and describe its application to two substantial repositories of typed λ-calculi. The first repository covers a broad range of typing features, including subtyping, polymorphism, type operators and kinding, computational effects, and dependent types. It describes both declarative and algorithmic aspects of the systems, and can be used with our tool, the TinkerType Assembler, to generate calculi either in the form of typeset collections of inference rules or as executable ML typecheckers. The second repository addresses a smaller collection of systems, and provides modularized proofs of basic safety properties.
  • Publication
    Matching Lenses: Alignment and View Update
    (2010-01-08) Davi, Barbosa M.J.; Julien, Cretin; Nate, Foster; Michael, Greenberg; Pierce, Benjamin C
    Bidirectional programming languages have been proposed as a practical approach to the view update problem. Programs in these languages, often called lenses, can be read in two ways— from left to right as functions mapping sources to views, and from right to left as functions mapping updated views back to updated sources. Lenses address the view update problem by making it possible to define a view and its associated update policy together. One issue that has not received sufficient attention in the design of bidirectional languages is alignment. In general, to correctly propagate an update to a view, a lens needs to match up the pieces of the edited view with corresponding pieces of the underlying source. Unfortunately, existing bidirectional languages are extremely limited in their treatment of alignment—they only support simple strategies that do not suffice for many examples of practical interest. In this paper, we propose a novel framework of matching lenses that extends basic lenses with new mechanisms for calculating and using alignments. We enrich the types of lenses with “chunks” that identify the locations of data that should be re-aligned after updates, and we formulate refined behavioral laws that capture essential constraints on the handling of chunks. To demonstrate the utility of our approach, we develop a core language of matching lenses for string data, and we extend it with primitives for describing a number of useful alignment heuristics.
  • Publication
    Behavioral Equivalence in the Polymorphic Pi-Calculus
    (1999-04-01) Pierce, Benjamin C; Sangiorgi, Davide
    We investigate parametric polymorphism in message-based concurrent programming, focusing on behavioral equivalences in a typed process calculus analogous to the polymorphic λ- calculus of Girard and Reynolds. Polymorphism constrains the power of observers by preventing them from directly manipulating data values whose types are abstract, leading to notions of equivalence much coarser than the standard untied ones. We study the nature of these constraints through simple examples of concurrent abstract data types and develop basic theoretical machinery for establishing bisimilarity of polymorphic processes. We also observe some surprising interactions between polymorphism and aliasing, drawing examples from both the polymorphic π-calculus and ML.
  • Publication
    Bringing Harmony to Optimism: An Experiment in Synchronizing Heterogeneous Tree-Structured Data
    (2004-03-18) Pierce, Benjamin C; Schmitt, Alan; Greenwald, Michael B
    Increased reliance on optimistic data replication has led to burgeoning interest in tools and frameworks for synchronizing disconnected updates to replicated data. To better understand the issues underlying the design of generic and heterogeneous synchronizers, we have implemented an experimental framework, called Harmony, that can be used to build synchronizers for tree-structured data stored in a variety of concrete formats. We present Harmony’s architecture, formalize its key components (a simple core synchronization algorithm together with a set of user-defined mappings between diverse concrete data formats and common abstract schemas suitable for synchronization), and discuss how the framework can be used to synchronize a variety of specific types of application data by suitable encodings into trees—including sets, records, tuples, relations, and, with some limitations, lists and ordered XML data.