A fast LSH-based similarity search method for multivariate time series

Chenyun Yu, Lintong Luo, Leanne Lai-Hang Chan, Thanawin Rakthanmanon*, Sarana Nutanong

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Due to advances in mobile devices and sensors, there has been an increasing interest in the analysis of multivariate time series. Identifying similar time series is a core subroutine in many data mining and analysis problems. However, existing solutions mainly focus on univariate time series and fail to scale as the number of dimensions increase. Although, dimensionality reduction can reduce the impact of noisy information, the number of dimensions may still be too large. In this paper, an efficient approximation method is proposed based on locality sensitive hashing. It is a two-step solution which firstly retrieves candidate time series and then exploits their hash values to compute distance estimates for pruning. To probabilistically guarantee the result accuracy, an extensive error analysis has been conducted to determine appropriate LSH parameters. In addition, we also apply the proposed method to the PkNN classification and hierarchical clustering workloads. Finally, extensive experiments are conducted using both the real multivariate time series and the high-dimensional representations generated from univariate datasets in different query processing and data analysis workloads. Empirical results have verified the findings from the error analyses and demonstrated their benefits in terms of query efficiency when dealing with a collection of multivariate time series.
Original languageEnglish
Pages (from-to)337-356
JournalInformation Sciences
Volume476
Online published19 Oct 2018
DOIs
Publication statusPublished - Feb 2019

Research Keywords

  • Dynamic time warping
  • Locality sensitive hashing
  • Multivariate time series
  • Query processing
  • Similarity search

Fingerprint

Dive into the research topics of 'A fast LSH-based similarity search method for multivariate time series'. Together they form a unique fingerprint.

Cite this