Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

7 Scopus Citations
View graph of relations

Author(s)

  • Qiufen Xia
  • Zichuan Xu
  • Shui Yu
  • Song Guo
  • Albert Y. Zomaya

Detail(s)

Original languageEnglish
Article number8732398
Pages (from-to)2677-2691
Journal / PublicationIEEE Transactions on Parallel and Distributed Systems
Volume30
Issue number12
Online published6 Jun 2019
Publication statusPublished - Dec 2019
Externally publishedYes

Abstract

Enterprise users at different geographic locations generate large-volume data that is stored at different geographic datacenters. These users may also perform big data analytics on the stored data to identify valuable information in order to make strategic decisions. However, it is well known that performing big data analytics on data in geographical-located datacenters usually is time-consuming and costly. In some delay-sensitive applications, the query result may become useless if answering a query takes too long time. Instead, sometimes users may only be interested in timely approximate rather than exact query results. When such approximate query evaluation is the case, applications must sacrifice timeliness to get more accurate evaluation results or tolerate evaluation result with a guaranteed error bound obtained from analyzing the samples of the data to meet their stringent timeline. In this paper, we study quality-of-service (QoS)-aware data replication and placement for approximate query evaluation of big data analytics in a distributed cloud, where the original (source) data of a query is distributed at different geo-distributed datacenters. We focus on the problems of placing data samples of the source data at some strategic datacenters to meet stringent query delay requirements of users, by exploring a non-trivial trade-off between the cost of query evaluation and the error bound of the evaluation result. We first propose an approximation algorithm with a provable approximation ratio for a single approximate query. We then develop an efficient heuristic algorithm for evaluating a set of approximate queries with the aim to minimize the evaluation cost while meeting the delay requirements of these queries. We finally demonstrate the effectiveness and efficiency of the proposed algorithms through both experimental simulations and implementations in a real test-bed, real datasets are employed. Experimental results show that the proposed algorithms are promising.

Research Area(s)

  • algorithm analysis, approximate query evaluation, approximation algorithms, big data analytics, Data replication and placement

Citation Format(s)

Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics. / Xia, Qiufen; Xu, Zichuan; Liang, Weifa; Yu, Shui; Guo, Song; Zomaya, Albert Y.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 30, No. 12, 8732398, 12.2019, p. 2677-2691.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review