Data Locality-Aware Big Data Query Evaluation in Distributed Clouds

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

7 Scopus Citations
View graph of relations



Original languageEnglish
Pages (from-to)791-809
Journal / PublicationComputer Journal
Issue number6
Publication statusPublished - 1 Jun 2017
Externally publishedYes


With more and more businesses and organizations outsourcing their IT services to distributed clouds for cost savings, historical and operational data generated by the services have been growing exponentially. The generated data that are referred to as big data, stored at different geographic datacenters, now become an invaluable asset to these businesses and organizations, as they can make use of the data through analysis to identify business advantages and make strategic decisions. Big data analytics thus has been emerged as a main research topic in cloud computing. To efficiently evaluate a big data analytic query in a distributed cloud consisting of multiple datacenters at different geographic locations interconnected by the Internet, it poses great challenges: (i) the source data of the query typically are located at different datacenters; and (ii) the resource demands of the query may be beyond the supplies of any single datacenter at that moment. In this paper, we formulate an online query evaluation problem for big data analytic queries in distributed clouds, with an objective to maximize the query acceptance ratio while minimizing the accumulative query evaluation cost, for which we first propose a novel metric to model the usages of different resources in the distributed cloud, by incorporating the capacities and workloads of different datacenters and links, as well as resource demands of different queries. We then devise efficient online algorithms for query evaluations under both unsplittable and splittable source data assumptions. We finally conduct extensive experiments by simulations to evaluate the performance of the proposed algorithms. Experimental results demonstrate that the proposed algorithms are promising, and outperform other heuristics at 95% confidence intervals.

Research Area(s)

  • Big data analytics, Data locality, Distributed clouds, Minimum cost multicommodity flow, Query evaluation optimization

Bibliographic Note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to

Citation Format(s)

Data Locality-Aware Big Data Query Evaluation in Distributed Clouds. / Xia, Qiufen; Liang, Weifa; Xu, Zichuan.
In: Computer Journal, Vol. 60, No. 6, 01.06.2017, p. 791-809.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review