TY - GEN
T1 - Sampling big trajectory data
AU - Li, Yanhua
AU - Chow, Chi-Yin
AU - Deng, Ke
AU - Yuan, Mingxuan
AU - Zeng, Jia
AU - Zhang, Jia-Dong
AU - Yang, Qiang
AU - Zhang, Zhi-Li
PY - 2015/10
Y1 - 2015/10
N2 - The increasing prevalence of sensors and mobile devices has led to an explosive increase of the scale of spatio-temporal data in the form of trajectories. A trajectory aggregate query, as a fundamental functionality for measuring trajectory data, aims to retrieve the statistics of trajectories passing a user-specified spatiotemporal region. A large-scale spatio-temporal database with big disk-resident data takes very long time to produce exact answers to such queries. Hence, approximate query processing with a guaranteed error bound is a promising solution in many scenarios with stringent response-time requirements. In this paper, we study the problem of approximate query processing for trajectory aggregate queries. We show that it boils down to the distinct value estimation problem, which has been proven to be very hard with powerful negative results given that no index is built. By utilizing the well-established spatio-temporal index and introducing an inverted index to trajectory data, we are able to design random index sampling (RIS) algorithm to estimate the answers with a guaranteed error bound. To further improve system scalability, we extend RIS algorithm to concurrent random index sampling (CRIS) algorithm to process a number of trajectory aggregate queries arriving concurrently with overlapping spatio-temporal query regions. To demonstrate the efficacy and efficiency of our sampling and estimation methods, we applied them in a real large-scale user trajectory database collected from a cellular service provider in China. Our extensive evaluation results indicate that both RIS and CRIS outperform exhaustive search for single and concurrent trajectory aggregate queries by two orders of magnitude in terms of the query processing time, while preserving a relative error ratio lower than 10%, with only 1% search cost of the exhaustive search method.
AB - The increasing prevalence of sensors and mobile devices has led to an explosive increase of the scale of spatio-temporal data in the form of trajectories. A trajectory aggregate query, as a fundamental functionality for measuring trajectory data, aims to retrieve the statistics of trajectories passing a user-specified spatiotemporal region. A large-scale spatio-temporal database with big disk-resident data takes very long time to produce exact answers to such queries. Hence, approximate query processing with a guaranteed error bound is a promising solution in many scenarios with stringent response-time requirements. In this paper, we study the problem of approximate query processing for trajectory aggregate queries. We show that it boils down to the distinct value estimation problem, which has been proven to be very hard with powerful negative results given that no index is built. By utilizing the well-established spatio-temporal index and introducing an inverted index to trajectory data, we are able to design random index sampling (RIS) algorithm to estimate the answers with a guaranteed error bound. To further improve system scalability, we extend RIS algorithm to concurrent random index sampling (CRIS) algorithm to process a number of trajectory aggregate queries arriving concurrently with overlapping spatio-temporal query regions. To demonstrate the efficacy and efficiency of our sampling and estimation methods, we applied them in a real large-scale user trajectory database collected from a cellular service provider in China. Our extensive evaluation results indicate that both RIS and CRIS outperform exhaustive search for single and concurrent trajectory aggregate queries by two orders of magnitude in terms of the query processing time, while preserving a relative error ratio lower than 10%, with only 1% search cost of the exhaustive search method.
KW - Approximate query processing
KW - Sampling
KW - Spatio-temporal databases
KW - Trajectory aggregate query
UR - http://www.scopus.com/inward/record.url?scp=84958250324&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-84958250324&origin=recordpage
U2 - 10.1145/2806416.2806422
DO - 10.1145/2806416.2806422
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9781450337946
VL - 19-23-Oct-2015
SP - 941
EP - 950
BT - Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 24th ACM International Conference on Information and Knowledge Management, CIKM 2015
Y2 - 19 October 2015 through 23 October 2015
ER -