Data transfer scheduling for maximizing throughput of big-data computing in cloud systems

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

4 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article number7180342
Pages (from-to)87-98
Journal / PublicationIEEE Transactions on Cloud Computing
Volume6
Issue number1
Online published5 Aug 2015
Publication statusPublished - Jan 2018

Abstract

Many big-data computing applications have been deployed in cloud platforms. These applications normally demand concurrent data transfers among computing nodes for parallel processing. It is important to find the best transfer scheduling leading to the least data retrieval time - the maximum throughput in other words. However, the existing methods cannot achieve this, because they ignore link bandwidths and the diversity of data replicas and paths. In this paper, we aim to develop a max-throughput data transfer scheduling to minimize the data retrieval time of applications. Specifically, the problem is formulated into mixed integer programming, and an approximation algorithm is proposed, with its approximation ratio analyzed. The extensive simulations demonstrate that our algorithm can obtain near optimal solutions.

Research Area(s)

  • Big-data computing, Data center, Data transfer scheduling, Throughput maximization