On the Feasibility of Dynamic Rescheduling on the Intel Distributed Computing Platform

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
CPS Real-Time
Distributed computing
Dynamic rescheduling
Cloud resource management
Trace-driven analysis
Intel NetBatch
Computer Engineering
Engineering
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Zhang, Zhuoyao
Tan, Godfrey
Jain, Saumya
Duong, Harrison
Loo, Boon Thau
Contributor
Abstract

This paper examines the feasibility of dynamic rescheduling techniques for effectively utilizing compute resources within a data center. Our work is motivated by practical concerns of Intel’s NetBatch system, an Internet-scale data center based distributed computing platform developed by Intel Corporation for massively parallel chip simulations within the company. NetBatch has been operational for many years, and currently is deployed live on tens of thousands of machines that are globally distributed at various data centers. We perform an analysis of job execution traces obtained over a one year period collected from tens of thousands of NetBatch machines from 20 different pools. Our analysis show that we observe that the NetBatch currently does not make full use of all the resources. Specifically, the job completion time can be severely impacted due to job suspension when higher priority jobs preempt lower priority jobs. We then develop dynamic job rescheduling strategies that adaptively restart jobs to available resources elsewhere, which better utilize system resources and improve completion times. Our trace-driven evaluation results show that dynamic rescheduling enables NetBatch to significantly reduce system waste and completion time of suspended jobs.

Advisor
Date of presentation
2010-11-01
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-17T06:04:49.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
11th ACM/IFIP/USENIX International Middleware Conference (Middleware), Bangalore, India, Nov 2010
Recommended citation
Collection