Scalable structural index construction for JSON analytics
Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review
Author(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 694-707 |
Journal / Publication | Proceedings of the VLDB Endowment |
Volume | 14 |
Issue number | 4 |
Publication status | Published - Dec 2020 |
Externally published | Yes |
Conference
Title | 47th International Conference on Very Large Data Bases (VLDB 2021) |
---|---|
Location | Tivoli Hotel & Congress Center (in-person and virtual) |
Place | Denmark |
City | Copenhagen |
Period | 16 - 20 August 2021 |
Link(s)
Abstract
JavaScript Object Notation ( JSON) and its variants have gained great popularity in recent years. Unfortunately, the performance of their analytics is often dragged down by the expensive JSON parsing. To address this, recent work has shown that building bitwise indices on JSON data, called structural indices, can greatly accelerate querying. Despite its promise, the existing structural index construction does not scale well as records become larger and more complex, due to its (inherently) sequential construction process and the involvement of costly memory copies that grow as the nesting level increases.
To address the above issues, this work introduces Pison – a more memory-efficient structural index constructor with supports of intra-record parallelism. First, Pison features a redesign of the bottleneck step in the existing solution. The new design is not only simpler but more memory-efficient. More importantly, Pison is able to build structural indices for a single bulky record in parallel, enabled by a group of customized parallelization techniques. Finally, Pison is also optimized for better data locality, which is especially critical in the scenario of bulky record processing. Our evaluation using real-world JSON datasets shows that Pison achieves 9.8X speedup (on average) over the existing structural index construction solution for bulky records and 4.6X speedup (on average) of end-to-end performance (indexing plus querying) over a state-of-the-art SIMD-based JSON parser on a 16-core machine.
To address the above issues, this work introduces Pison – a more memory-efficient structural index constructor with supports of intra-record parallelism. First, Pison features a redesign of the bottleneck step in the existing solution. The new design is not only simpler but more memory-efficient. More importantly, Pison is able to build structural indices for a single bulky record in parallel, enabled by a group of customized parallelization techniques. Finally, Pison is also optimized for better data locality, which is especially critical in the scenario of bulky record processing. Our evaluation using real-world JSON datasets shows that Pison achieves 9.8X speedup (on average) over the existing structural index construction solution for bulky records and 4.6X speedup (on average) of end-to-end performance (indexing plus querying) over a state-of-the-art SIMD-based JSON parser on a 16-core machine.
Citation Format(s)
Scalable structural index construction for JSON analytics. / Jiang, Lin; Qiu, Junqiao; Zhao, Zhijia.
In: Proceedings of the VLDB Endowment, Vol. 14, No. 4, 12.2020, p. 694-707.Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review