Accelerating Data Analytics Systems with Efficient Resource Scheduling

高效的資源調度加速數據分析系統

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date30 Dec 2019

Abstract

Resource scheduling is critical in data analytics systems, including end-host and in-network resources. However, existing schedulers fail to schedule resources efficiently. The main culprit of existing schedulers’ problems is the workload characteristic agnostic nature, leading to their inefficiency to deal with workload dynamics.

In this dissertation, we propose our resource schedulers to accelerate data analytics systems. The core idea is to dynamically allocate resource to applications and strategically pack applications to hosts according to their time-varying resource demands, proactively identify the bottleneck flows and reduce excessive bandwidth to the non-bottleneck flows, and then reallocate the saved bandwidth to other coflows following their priority levels. We realize our schedulers at the end-hosts (Elasecutor) and in the network (Fai), respectively.

• Elasecutor: a novel executor scheduler for data analytics systems. Elasecutor dynamically allocates and explicitly sizes resources to executors over time according to the predicted time-varying resource demands. Rather than placing executors using their peak demand, Elasecutor strategically assigns them to machines based on a concept called dominant remaining resource to minimize resource fragmentation. Elasecutor further adaptively reprovisions resources in order to tolerate inaccurate demand prediction.

• Fai: a bottleneck-aware coflow scheduler without prior knowledge. Fai adopts loose coordination to update coflow priority and flow rates based on total bytes sent. In addition, Fai detects bottleneck flows based on a flow’s rate and bytes sent, and deallocates bandwidth for other flows to match the bottleneck rate without affecting the coflow completion time (CCT). The saved bandwidth is then distributed among coflows according to their priority to improve overall performance.

We implement Elasecutor and Fai and evaluate them with testbed experiments. Compared to existing approaches, Elasecutor and Fai reduce makespan and median application completion time, and improve cluster resource utilization significantly. They accelerate data analytics systems substantially.