JUST-IN-TIME SCALE-OUT OF SHELL PROGRAMS, CORRECTLY

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Computer and Information Science
Discipline
Computer Sciences
Subject
Funder
Grant number
License
Copyright date
01/01/2024
Distributor
Related resources
Author
Kallas, Konstantinos
Contributor
Abstract

Shell programs are critical infrastructure for developers, administrators, and scientists. They are used for all kinds of complex tasks, often as "glue" for succinctly composing existing computational components. Unfortunately, they do not enjoy access to automated performance optimizations typically found in other language environments---including parallelization for scaling out on multicore CPUs and distribution to support processing of data that does not fit on a single machine. This unfortunate state of affairs is due to three fundamental challenges inherent to the shell: (1) shell programs compose arbitrary black-box software components (commands) that are developed in multiple programming languages and cannot be analyzed in a unified way; (2) the language of the shell offers primitives that are highly dynamic, making static analysis intractable; and (3) the shell specification is complex and different implementations vary significantly, making it extremely hard for optimizations to achieve compliance with existing shells, jeopardizing backwards compatibility. In this dissertation, I propose a novel compilation architecture that addresses the three aforementioned challenges using: (1) a command specification framework to capture command behavior; (2) a just-in-time architecture that applies optimizations at runtime after dynamic information has been resolved; and (3) a shell-to-shell compiler shim whose generated optimized shell programs can be executed by the original shell interpreter. I develop a concrete instantiation of this architecture in a high-performance open-source system called PaSh. I demonstrate the benefits of this compilation architecture on real-world programs using three concrete optimizations: automatically parallelizing, distributing, and reordering the execution of shell programs---achieving significant speedups without jeopardizing compliance with the underlying shell.

Advisor
Alur, Rajeev
Date of degree
2024
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation