Regular Programming Over Data Streams

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Computer and Information Science
Discipline
Subject
Data streams
Programming languages
Quantitative analysis
Regular expressions
Computer Sciences
Funder
Grant number
License
Copyright date
2018-02-23T20:17:00-08:00
Distributor
Related resources
Contributor
Abstract

Data streams arise in a variety of applications, such as feeds from financial markets, event streams from sensors and medical devices, logs produced by long-running programs, click-streams from websites, and packet sequences passing through internet routers. In this thesis, we are concerned with computing quantitative statistics over these streams, and with expressing transformations in the related domain of strings. Many string transformations are instances of simple patterns, such as inserting, deleting and replacing substrings, or applying a function to each element in the stream. Over data streams, the task is usually to compute some simple quantitative statistic, such as counting the number of occurrences of a pattern or the mean time between occurrences of an event. There has traditionally been limited programming language support for stream processing, and programmers are forced to write low-level code, by manually maintaining state and updating it on seeing each new input element. This sacrifices both ease of expression and amenability to static analysis. We propose a simple, expressive programming model for stream transformations, with strong theoretical foundations and fast evaluation algorithms. We present two concrete systems: DReX, to express string-to-string transformations, and quantitative regular expressions (QREs) for numerical queries. Both formalisms start with a set of basic functions and a small collection of hierarchically composable combinators, analogous to the operations of regular expressions. The operators are simple to describe, and can be used to combine small, easy-to-understand expressions into more complicated expressions. The functions expressible using DReX and QREs coincide with the class of regular string transformations, which is a robust class with multiple characterizations and appealing closure properties (under composition, input reversal, and regular look-ahead). We present a single-pass linear-time evaluation algorithm for function expressions, and study efficient approximate representations of numerical terms, so that some numerical QREs can also be evaluated with sub-linear memory requirements.

Advisor
Rajeev Alur
Date of degree
2017-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation