Cost-Based Dynamic Job Rescheduling: A Case Study of the Intel Distributed Computing Platform

Loading...
Thumbnail Image
Penn collection
Technical Reports (CIS)
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Phan, Linh T.X.
Zhang, Zhuoyao
Jain, Saumya
Tan, Godfrey
Contributor
Abstract

We perform a trace-driven analysis of the Intel Distributed Computing Platform (IDCP), an Internet-scale data center based distributed computing platform developed by Intel Corporation for massively parallel chip simulations within the company. IDCP has been operational for many years, and currently is deployed “live” on tens of thousands of machines that are globally distributed at various data centers. Our analysis is performed on job execution traces obtained over a one year period collected from tens of thousands of IDCP machines from 20 different pools. Our analysis demonstrates that job completion time can be severely impacted due to job suspension when higher priority jobs preempt lower priority jobs. We then develop cost-based dynamic job rescheduling strategies that adaptively restart suspended jobs, which better utilize system resources and improve completion times. Our trace-driven evaluation results show that dynamic rescheduling enables IDCP to significantly reduce job completion times.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2010-01-01
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-10-14.
Recommended citation
Collection