Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Computer and Information Science

First Advisor

Boon T. Loo

Second Advisor

Linh T. Phan


Cloud applications are taking an increasingly important place our technology and economic landscape. Consequently, they are subject to stringent performance requirements. High tail latency — percentiles at the tail of the response time distribution — is a threat to these requirements. As little as 0.01% slow requests in one microservice can significantly degrade performance for the entire application. The conventional wisdom is that application-awareness is crucial to design optimized performance management systems, but comes at the cost of maneuverability. Consequently, existing execution environments are often general-purpose and ignore important application features such as the architecture of request processing pipelines or the type of requests being served. These one-size-fits-all solutions are missing crucial information to identify and remove sources of high tail latency. This thesis aims to develop a lightweight execution environment exploiting application semantics to optimize tail performance for cloud services. This system, dubbed Service Boosters, is a library operating system exposing application structure and semantics to the underlying resource management stack. Using Service Boosters, programmers use a generic programming model to build, declare and an-notate their request processing pipeline, while performance engineers can program advanced management strategies. Using Service Boosters, I present three systems, FineLame, Perséphone, and DeDoS, that exploit application awareness to provide real time anomaly detection; tail-tolerant RPC scheduling; and resource harvesting. FineLame leverages awareness of the request processing pipeline to deploy monitoring and anomaly detection probes. Using these, FineLame can detect abnormal requests in-flight whenever they depart from the expected behavior and alerts other resource management modules. Pers ́ephone exploits an understanding of request types to dynamically allocate resources to each type and forbid pathological head-of-line blocking from heavy-tailed workloads, without the need for interrupts. Pers ́ephone is a low overhead solution well suited for microsecond scale workloads. Finally, DeDoS can identify overloaded components and dynamically scale them, harvesting only the resources needed to quench the overload. Service Boosters is a powerful framework to handle tail latency in the datacenter. Service Boosters clearly separates the roles of application development and performance engineering, proposing a general purpose application programming model while enabling the development of specialized resource management modules such as Perséphone and DeDoS.