HARVESTING CORE COMPUTE RESOURCES TO THE EXTREME

Loading...
Thumbnail Image
Degree type
PhD
Graduate group
Computer and Information Science
Discipline
Computer Sciences
Subject
cloud computing
database
hardware
high performance system
machine learning system
network
Funder
Grant number
License
Copyright date
01/01/2025
Distributor
Related resources
Author
Chen, Xinyi
Contributor
Abstract

The rapid growth of cloud-based applications continues to drive extraordinary demand for highperformance and scalability. As data volumes increase and applications become more sophisticated, modern systems integrate increasingly diverse hardware and software components. While network bandwidth improves rapidly, CPU performance has plateaued, and GPUs have emerged as the dominant compute engines for AI workloads. This growing heterogeneity introduces significant system complexity, often resulting in core compute resources—CPUs and GPUs—being partially consumed by infrastructure overhead such as network processing, data movement, and resource orchestration. This thesis demonstrates that in many high-performance systems, these inefficiencies go unnoticed, and core computation cycles are silently lost to non-application tasks. By identifying and addressing these hidden overheads, we show how compute resources can be harvested and returned to user workloads, thereby improving end-to-end application performance. We explore this principle across two major system trends: one where CPUs remain central, but emerging hardware such as SmartNICs offers new opportunities to alleviate CPU workloads—opportunities that this thesis actively exploits; and another where GPUs drive large-scale AI computation and face increasing pressure from dynamic workloads and GPU memory constraints. Through system-level designs that reduce overhead and strategically reallocate compute touser-level execution, this thesis presents a unified approach to maximizing the effective use of core computation amid growing hardware and software complexity This dissertation presents a line of systems work that target both two trends. It first focuseson the CPU-centric trend and introduces Cowbird, a system which fully offload network overhead in memory disaggregation from CPUs. This allows applications to benefit from expanded memory capacity without compromising CPU performance. Then, it discusses the extension to the GPUCowbird version and analyzes the reasons behind its limitations. Lastly, it shifts the focus to the GPU-centric trend and presents SwiftServe, which harvests unused GPU resources during in-place upgrades by overlapping engine initialization with ongoing inference, achieving minimal service disruption and maintaining SLA compliance under dynamic workloads.

Advisor
Liu, Vincent, VL
Date of degree
2025
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation