Bottleneck-Aware Non-Clairvoyant Coflow Scheduling with Fai

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Number of pages14
Journal / PublicationIEEE Transactions on Cloud Computing
Online published16 Nov 2021
Publication statusOnline published - 16 Nov 2021

Abstract

Coflow scheduling is critical to data-parallel applications in data centers. While schemes like Varys can achieve optimal performance, they require a priori information about coflows which is hard to obtain in practice. Existing non-clairvoyant solutions like Aalo generalize least attained service (LAS) scheduling discipline to address this issue. However, they fail to identify the bottleneck flows in a coflow and tend to allocate excessive bandwidth to the non-bottleneck flows, leading to bandwidth wastage and inferior overall performance. To this end, we present Fai that strives to improve the overall coflow performance by accelerating the bottleneck flows without prior knowledge. Fai employs bottleneck-aware scheduling. It adopts loose coordination to update coflow priority and flow rates based on total bytes sent. In addition, Fai detects bottleneck flows based on a flows rate and bytes sent, and de-allocates bandwidth for other flows to match the bottleneck rate without affecting the coflow completion time (CCT). The saved bandwidth is then distributed among coflows according to their priority to improve overall performance. Testbed evaluation on a 40-node cluster shows that Fai improves average (P95) CCT by 1.73X (3.43X), compared to Aalo. Large-scale trace-driven simulations also show that Fai outperforms Aalo substantially.

Research Area(s)

  • Bandwidth, Bottleneck-aware, Cloud computing, Coflow completion time, Coflow scheduling, Data centers, Datacenter networks, Fabrics, Job shop scheduling, Processor scheduling, Uplink