Resource Sharing for Machine Learning Serving
Degree type
Graduate group
Discipline
Data Science
Subject
Multiplexing
Resource Sharing
Scheduling
System
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
The proliferation of machine learning has transformed numerous applications, leading to unprecedented demands on datacenter computation resources. While researchers have made significant strides in improving machine learning serving efficiency through various approaches, the rapid evolution of machine learning continues to pose new challenges. Traditional design principles, which focus on optimizing individual components such as computation kernels, memory usage, and collective communication, are struggling to keep pace with the increasingly integrated, irregular, and massive machine learning models. This dissertation proposes resource sharing as a fundamental design principle to address these emerging challenges. While traditional designs emphasize the performance of individual components, we focus on the interaction among components and identify previously unexplored opportunities for resource sharing. To this end, we introduce novel approaches on computation resource multiplexing and common execution path sharing. This dissertation presents our research across these two aspects. First, we introduce Paella, our software-defined GPU scheduling framework that enables fine-grained control over scheduling to achieve efficient multiplexing of computation resources among different models. Second, we present our novel model serving system that optimizes for the sharing of common execution paths among different inference pipelines through a dynamic model execution system and a data-driven placement optimization algorithm. We further identify future research directions to advance the paradigm of resource sharing: (1) performance and security isolation for practical deployment, and (2) optimized computation-communication overlapping by improving memory locality to reduce stress on the memory system.