Bohr: Similarity aware geo-distributed data analytics

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

3 Citations (Scopus)

Abstract

We propose Bohr, a similarity aware geo-distributed data analytics system that minimizes query completion time. The key idea is to exploit similarity between data in different data centers (DCs), and transfer similar data from the bottleneck DC to other sites with more WAN bandwidth. Though these sites have more input data to process, these data are more similar and can be more efficiently aggregated by the combiner to reduce the intermediate data that needs to be shuffled across the WAN. Thus our similarity aware approach reduces the shuffle time and in turn the query completion time (QCT).

We design Bohr based on OLAP data cubes to perform efficient similarity checking among datasets in different sites. We implement Bohr on Spark and deploy it across ten sites of AWS EC2. Our extensive evaluation using realistic query workloads shows that Bohr improves the QCT by up to 50% and reduces the intermediate data by up to 6x compared to state-of-the-art solutions that also use OLAP cubes.
Original languageEnglish
Title of host publicationCoNEXT 2018 - Proceedings of the 14th International Conference on Emerging Networking EXperiments and Technologies
PublisherAssociation for Computing Machinery
Pages267-279
ISBN (Print)978-1-4503-6080-7
DOIs
Publication statusPublished - Dec 2018
Event14th International Conference on Emerging Networking EXperiments and Technologies, CoNEXT 2018 - Heraklion, Greece
Duration: 4 Dec 20187 Dec 2018

Publication series

NameCoNEXT - Proceedings of the International Conference on Emerging Networking EXperiments and Technologies

Conference

Conference14th International Conference on Emerging Networking EXperiments and Technologies, CoNEXT 2018
PlaceGreece
CityHeraklion
Period4/12/187/12/18

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Research Keywords

  • Cloud computing
  • Data analytics
  • WAN

Fingerprint

Dive into the research topics of 'Bohr: Similarity aware geo-distributed data analytics'. Together they form a unique fingerprint.

Cite this