Pierce, Benjamin C

Email Address
ORCID
Disciplines
Research Projects
Organizational Units
Position
Introduction
Research Interests

Search Results

Now showing 1 - 10 of 40
  • Publication
    Quotient Lenses
    (2009-02-10) Foster, J. Nathan; Pilkiewicz, Alexandre; Pierce, Benjamin C
    There are now a number of bidirectional programming languages, where every program can be read both as a forward transformation mapping one data structure to another and as a reverse transformation mapping an edited output back to a correspondingly edited input. Besides parsimony—the two related transformations are described by just one expression— such languages are attractive because they promise strong behavioral laws about how the two transformations fit together—e.g., their composition is the identity function. It has repeatedly been observed, however, that such laws are actually a bit too strong: in practice, we do not want them “on the nose,” but only up to some equivalence, allowing inessential details, such as whitespace, to be modified after a round trip. Some bidirectional languages loosen their laws in this way, but only for specific, baked-in equivalences. In this work, we propose a general theory of quotient lenses—bidirectional transformations that are well behaved modulo equivalence relations controlled by the programmer. Semantically, quotient lenses are a natural refinement of lenses, which we have studied in previous work. At the level of syntax, we present a rich set of constructs for programming with canonizers and for quotienting lenses by canonizers. We track equivalences explicitly, with the type of every quotient lens specifying the equivalences it respects. We have implemented quotient lenses as a refinement of the bidirectional string processing language Boomerang. We present a number of useful primitive canonizers for strings, and give a simple extension of Boomerang’s regular-expression-based type system to statically typecheck quotient lenses. The resulting language is an expressive tool for transforming real-world, ad-hoc data formats. We demonstrate the power of our notation by developing an extended example based on the UniProt genome database format and illustrate the generality of our approach by showing how uses of quotienting in other bidirectional languages can be translated into our notation.
  • Publication
    How Good Is Local Type Inference?
    (1999-06-22) Hosoya, Haruo; Pierce, Benjamin C
    A partial type inference technique should come with a simple and precise specification, so that users predict its behavior and understand the error messages it produces. Local type inference techniques attain this simplicity by inferring missing type information only from the types of adjacent syntax nodes, without using global mechanisms such as unification variables. The paper reports on our experience with programming in a full-featured programming language including higher-order polymorphism, subtyping, parametric datatypes, and local type inference. On the positive side, our experiments on several nontrivial examples confirm previous hopes for the practicality of the type inference method. On the negative side, some proposed extensions mitigating known expressiveness problems turn out to be unsatisfactory on close examination.
  • Publication
    Union Types for Semistructured Data
    (1999-04-06) Buneman, Peter; Pierce, Benjamin C
    Semistructured databases are treated as dynamically typed: they come equipped with no independent schema or type system to constrain the data. Query languages that are designed for semistructured data, even when used with structured data, typically ignore any type information that may be present. The consequences of this are what one would expect from using a dynamic type system with complex data: fewer guarantees on the correctness of applications. For example, a query that would cause a type error in a statically typed query language will return the empty set when applied to a semistructured representation of the same data. Much semistructured data originates in structured data. A semistructured representation is useful when one wants to add data that does not conform to the original type or when one wants to combine sources of different types. However, the deviations from the prescribed types are often minor, and we believe that a better strategy than throwing away all type information is to preserve as much of it as possible. We describe a system of untagged union types that can accommodate variations in structure while still allowing a degree of static type checking. A novelty of this system is that it involves non-trivial equivalences among types, arising from a law of distributivity for records and unions: a value may be introduced with one type (e.g., a record containing a union) and used at another type (a union of records). We describe programming and query language constructs for dealing with such types, prove the soundness of the type system, and develop algorithms for subtyping and typechecking.
  • Publication
    Statically Typed Document Transformation: An XTATIC Experience
    (2005-10-14) Gapeyev, Vladimir; Garillot, François; Pierce, Benjamin C
    XTATIC is a lightweight extension of C⋕ with native support for statically typed XML processing. It features XML trees as built-in values, a refined type system based on regular types à la XDUCE, and regular patterns for investigating and manipulating XML. We describe our experiences using XTATIC in a real-world application: a program for transforming XMLSPEC, a format used for authoring W3C technical reports, into HTML. Our implementation closely follows an existing one written in XSLT, facilitating comparison of the two languages and analysis of the costs and benets—both signicant—of rich static typing for XML-intensive code.
  • Publication
    Boomerang: Resourceful Lenses for String Data
    (2007-11-19) Bohannon, Aaron; Foster, J. Nathan; Pierce, Benjamin C; Pilkiewicz, Alexandre; Schmitt, Alan
    A lens is a bidirectional program. When read from left to right, it denotes an ordinary function that maps inputs to outputs. When read from right to left, it denotes an "update translator" that takes an input together with an updated output and produces a new input that reflects the update. Many variants of this idea have been explored in the literature, but none deal fully with ordered data. If, for example, an update changes the order of a list in the output, the items in the output list and the chunks of the input that generated them can be misaligned, leading to lost or corrupted data. We attack this problem in the context of bidirectional transformations over strings, the primordial ordered data type. We first propose a collection of bidirectional string lens combinators, based on familiar operations on regular transducers (union, concatenation, Kleene-star) and with a type system based on regular expdressions. We then design a new semantic space of dictionary lenses, enriching the lenses of Foster et al. (2007b) with support for two additional combinators for marking "reorderable chunks" and their keys. To demonstrate the effectiveness of these primitives, we describe the design and implementation of Boomerang, a full-blown bidirectional programming language with dictionary lenses at its core. We have used Boomerang to build transformers for complex real-world data formats including the SwissProt genomic database. We formalize the essential property of resourcefulness - the correct use of keys to associate chunks in the input and output - by defining a refined semantic space of quasi-oblivious lenses. Several previously studied properties of lenses turn out to have compact characterizations in this space.
  • Publication
    Symmetric Lenses
    (2010-07-28) Hoffmann, Martin; Pierce, Benjamin C; Wagner, Daniel
    Lenses—bidirectional transformations between pairs of connected structures—have been extensively studied and are beginning to find their way into industrial practice. However, some aspects of their foundations remain poorly understood. In particular, most previous work has focused on the special case of asymmetric lenses, where one of the structures is taken as primary and the other is thought of as a projection, or view. A few studies have considered symmetric variants, where each structure contains information not present in the other, but these all lack the basic operation of composition. Moreover, while many domain-specific languages based on lenses have been designed, lenses have not been thoroughly studied from a more fundamental algebraic perspective. We offer two contributions to the theory of lenses. First, we present a new symmetric formulation, based on complements, an old idea from the database literature. This formulation generalizes the familiar structure of asymmetric lenses, and it admits a good notion of composition. Second, we explore the algebraic structure of the space of symmetric lenses. We present generalizations of a number of known constructions on asymmetric lenses and settle some longstanding questions about their properties—in particular, we prove the existence of (symmetric monoidal) tensor products and sums and the non-existence of full categorical products or sums in the category of symmetric lenses. We then show how the methods of universal algebra can be applied to build iterator lenses for structured data such as lists and trees, yielding lenses for operations like mapping, filtering, and concatenation from first principles. Finally, we investigate an even more general technique for constructing mapping combinators, based on the theory of containers.
  • Publication
    Contracts Made Manifest
    (2010-01-17) Pierce, Benjamin C; Greenberg, Michael; Weirich, Stephanie
    Since Findler and Felleisen introduced higher-order contracts, many variants have been proposed. Broadly, these fall into two groups: some follow Findler and Felleisen in using latent contracts, purely dynamic checks that are transparent to the type system; others use manifest contracts, where refinement types record the most recent check that has been applied to each value. These two approaches are commonly assumed to be equivalent-different ways of implementing the same idea, one retaining a simple type system, and the other providing more static information. Our goal is to formalize and clarify this folklore understanding. Our work extends that of Gronski and Flanagan, who defined a latent calculus lambdac and a manifest calculus lambdah, gave a translation phi from lambdac to lambdah, and proved that, if a lambdac term reduces to a constant, then so does its phiimage. We enrich their account with a translation psi from lambdah to lambdac and prove an analogous theorem. We then generalize the whole framework to dependent contracts, whose predicates can mention free variables. This extension is both pragmatically crucial, supporting a much more interesting range of contracts, and theoretically challenging. We define dependent versions of lambdah and two dialects (“lax” and “picky”) of lambdac, establish type soundness-a substantial result in itself, for lambdah-and extend phi and psi accordingly. Surprisingly, the intuition that the latent and manifest systems are equivalent now breaks down: the extended translations preserve behavior in one direction but, in the other, sometimes yield terms that blame more.
  • Publication
    Schema-Directed Data Synchronization
    (2005-03-23) Greenwald, Michael B; Foster, J. Nathan; Pierce, Benjamin C; Kirkegaard, Christian; Schmitt, Alan
    Increased reliance on optimistic data replication has led to burgeoning interest in tools and frameworks for synchronizing disconnected updates to replicated data. We have implemented a generic, synchronization framework, called Harmony, that can be instantiated to yield state-based synchronizers for a wide variety of tree-structured data formats. A novel feature of this framework is that the synchronization process—in particular, the recognition of situations where changes are in conflict—is driven by the schema of the structures being synchronized. We formalize Harmony’s synchronization algorithm, prove that it obeys a simple and intuitive specification, and illustrate how it can be used to synchronize a variety of specific forms of application data—sets, records, tuples, and relations.
  • Publication
    On Decidability of Nominal Subtyping with Variance
    (2006-09-01) Kennedy, Andrew J; Pierce, Benjamin C
    We investigate the algorithmics of subtyping in the presence of nominal inheritance and variance for generic types, as found in Java 5, Scala 2.0, and the .NET 2.0 Intermediate Language. We prove that the general problem is undecidable and characterize three different decidable fragments. From the latter, we conjecture that undecidability critically depends on the combination of three features that are not found together in any of these languages: contravariant type constructors, class hierarchies in which the set of types reachable from a given type by inheritance and decomposition is not always finite, and class hierarchies in which a type may have multiple supertypes with the same head constructor. These results settle one case of practical interest: subtyping between ground types in the .NET intermediate language is decidable; we conjecture that our proof can also be extended to show full decidability of subtyping in .NET. For Java and Scala, the decidability questions remain open; however, the proofs of our preliminary results introduce a number of novel techniques that we hope may be useful in further attacks on these questions.
  • Publication
    Contracts Made Manifest
    (2012-05-01) Pierce, Benjamin C; Greenberg, Michael; Weirich, Stephanie
    Since Findler and Felleisen (Findler, R. B. & Felleisen, M. 2002) introduced higher-order contracts, many variants have been proposed. Broadly, these fall into two groups: some follow Findler and Felleisen (2002) in using latent contracts, purely dynamic checks that are transparent to the type system; others use manifest contracts, where refinement types record the most recent check that has been applied to each value. These two approaches are commonly assumed to be equivalent—different ways of implementing the same idea, one retaining a simple type system, and the other providing more static information. Our goal is to formalize and clarify this folklore understanding. Our work extends that of Gronski and Flanagan (Gronski, J. & Flanagan, C. 2007), who defined a latent calculus λC and a manifest calculus λH, gave a translation φ from λC to λH, and proved that if a λC term reduces to a constant, so does its φ-image. We enrich their account with a translation ψ from λH to λC and prove an analogous theorem. We then generalize the whole framework to dependent contracts, whose predicates can mention free variables. This extension is both pragmatically crucial, supporting a much more interesting range of contracts, and theoretically challenging. We define dependent versions of λH and two dialects (“lax” and “picky”) of λC, establish type soundness—a substantial result in itself, for λH — and extend φ and ψ accordingly. Surprisingly, the intuition that the latent and manifest systems are equivalent now breaks down: the extended translations preserve behavior in one direction, but in the other, sometimes yield terms that blame more.