Abstract
We propose Bohr, a similarity aware geo-distributed data analytics system that minimizes query completion time. The key idea is to exploit similarity between data in different data centers (DCs), and transfer similar data from the bottleneck DC to other sites with more WAN bandwidth. Though these sites have more input data to process, these data are more similar and can be more efficiently aggregated by the combiner to reduce the intermediate data that needs to be shuffled across the WAN. Thus our similarity aware approach reduces the shuffle time and in turn the query completion time (QCT).
We design and implement Bohr based on OLAP data cubes to perform efficient similarity checking among datasets in different sites. Evaluation across ten sites of AWS EC2 shows that Bohr decreases the QCT by 30% compared to state-of-the-art solutions.
We design and implement Bohr based on OLAP data cubes to perform efficient similarity checking among datasets in different sites. Evaluation across ten sites of AWS EC2 shows that Bohr decreases the QCT by 30% compared to state-of-the-art solutions.
| Original language | English |
|---|---|
| Title of host publication | HotCloud'17 |
| Subtitle of host publication | Proceedings of the 9th USENIX Conference on Hot Topics in Cloud Computing |
| Publisher | USENIX Association |
| Publication status | Published - Jul 2017 |
| Event | The 9th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '17). - Santa Clara, United States Duration: 12 Jul 2017 → 14 Jul 2017 https://www.usenix.org/conference/hotcloud17 |
Publication series
| Name | HotCloud: Proceedings of the USENIX Conference on Hot Topics in Cloud Computing |
|---|---|
| Publisher | USENIX Association |
Workshop
| Workshop | The 9th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '17). |
|---|---|
| Place | United States |
| City | Santa Clara |
| Period | 12/07/17 → 14/07/17 |
| Internet address |