Categorical Foundations Of First-Order Abstract Syntax
Degree type
Graduate group
Discipline
Subject
formal verification
metatheory
monads
programming languages
syntax
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
The representation of syntax is central to programming languages and formal verification. Yet existing approaches abstract away from the first-order nominal treatment of syntax found in textbooks, where substitution requires systematic renaming of bound variables to avoid variable capture. Although this representation closely mirrors what compilers actually implement, it remains notoriously difficult to formalize in proof assistants. Moreover, translating to a more convenient internal representation does not eliminate the need to reason about nominal variable binding; it merely shifts the burden to verifying the correctness of the translation. A rigorous mathematical treatment of concrete first-order representations of syntax is therefore required. This work introduces decorated traversable monads (DTMs) as a foundation for extrinsically-scoped, extrinsically-typed first-order abstract syntax. DTMs unify several classical and lesser-known categorical structures under a novel set of coherence laws relating them to each other. The result can be characterized from three theoretically equivalent perspectives, each offering distinct practical advantages. The accompanying Rocq library, Tealeaves, formalizes these results and establishes their practical applicability. Tealeaves faithfully reproduces the functionality of other tools used with Rocq, while extending them with new capabilities. Tealeaves naturally supports variadic and mutually-recursive binders, and it provides certified translations between different representations of variables within a unified framework. Tealeaves provides the first datatype-generic formalization of alpha-equivalence that both (i) matches its conventional definition and (ii) comes with an executable proof that alpha-equivalence classes correspond bijectively to well-formed locally nameless terms, which in turn correspond to de Bruijn terms in a well-formed environment. This result enables new forms of modular, reusable, and mechanically-verified reasoning about capture-avoiding substitution, helping bridge a critical gap in the formal verification of programming languages and their implementations.
Advisor
Zdancewic, Steve, A