HARVESTING CORE COMPUTE RESOURCES TO THE EXTREME
Degree type
Graduate group
Discipline
Subject
database
hardware
high performance system
machine learning system
network
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
The rapid growth of cloud-based applications continues to drive extraordinary demand for highperformance and scalability. As data volumes increase and applications become more sophisticated, modern systems integrate increasingly diverse hardware and software components. While network bandwidth improves rapidly, CPU performance has plateaued, and GPUs have emerged as the dominant compute engines for AI workloads. This growing heterogeneity introduces significant system complexity, often resulting in core compute resources—CPUs and GPUs—being partially consumed by infrastructure overhead such as network processing, data movement, and resource orchestration. This thesis demonstrates that in many high-performance systems, these inefficiencies go unnoticed, and core computation cycles are silently lost to non-application tasks. By identifying and addressing these hidden overheads, we show how compute resources can be harvested and returned to user workloads, thereby improving end-to-end application performance. We explore this principle across two major system trends: one where CPUs remain central, but emerging hardware such as SmartNICs offers new opportunities to alleviate CPU workloads—opportunities that this thesis actively exploits; and another where GPUs drive large-scale AI computation and face increasing pressure from dynamic workloads and GPU memory constraints. Through system-level designs that reduce overhead and strategically reallocate compute touser-level execution, this thesis presents a unified approach to maximizing the effective use of core computation amid growing hardware and software complexity This dissertation presents a line of systems work that target both two trends. It first focuseson the CPU-centric trend and introduces Cowbird, a system which fully offload network overhead in memory disaggregation from CPUs. This allows applications to benefit from expanded memory capacity without compromising CPU performance. Then, it discusses the extension to the GPUCowbird version and analyzes the reasons behind its limitations. Lastly, it shifts the focus to the GPU-centric trend and presents SwiftServe, which harvests unused GPU resources during in-place upgrades by overlapping engine initialization with ongoing inference, achieving minimal service disruption and maintaining SLA compliance under dynamic workloads.