Skip to main navigation Skip to search Skip to main content

Hitcher: Efficient GPU-based Vector Search via Cluster-Centric Kernel and Hitch-Ride Ordering

  • Qihui Zhou (Co-first Author)
  • , Changji Li (Co-first Author)
  • , Guanxian Jiang
  • , Chenhao Ma
  • , Xiao Yan*
  • , Yu Mao
  • , Ming-Chang Yang
  • , James Cheng
  • *Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Similarity-based vector search, which retrieves the most similar vectors to a given query vector from a large vector dataset, underlies many applications such as search, recommendation, and Large Language Models (LLMs). Some systems run vector search on GPUs to enjoy GPU's high parallelism, but we observe that they are limited in query throughput and latency. In particular, their query-centric GPU kernel conducts computation independently for each query, failing to reuse data loaded to the GPU shared memory across queries and leading to a low GPU compute utilization. While their batch-based task reordering rearranges computation for queries in a batch to reduce CPU-GPU data transfer, but latency is prolonged since each query needs to wait for its slowest task. To tackle these problems, we propose Hitcher. Specifically, to reuse data across queries and improve GPU utilization, Hitcher implements a cluster-centric GPU kernel to batch computation on the same data for multiple queries. To reduce query latency, Hitcher adopts the hitch-ride ordering, which preserves the arrival order for query processing while batching computation across queries to improve efficiency. Hitcher can also offload computation tasks to the CPU to reduce CPU-GPU data transfer and utilize multiple GPUs. Experimental results show that Hitcher achieves up to 22× lower P99 query latency and 9× higher query throughput when compared with the state-of-the-art GPU-based vector query processing systems. © 2026 Owner/Author.
Original languageEnglish
Title of host publicationKDD '26
Subtitle of host publicationProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1
PublisherAssociation for Computing Machinery
Pages2066-2075
Number of pages10
ISBN (Print)979-8-4007-2258-5
DOIs
Publication statusOnline published - 20 Apr 2026
Event32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026) - International Convention Center Jeju (ICC Jeju), Jeju Island, Korea, Republic of
Duration: 9 Aug 202613 Aug 2026
https://kdd2026.kdd.org/

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume1-A
ISSN (Print)2154-817X

Conference

Conference32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)
Abbreviated titleACM KDD 2026
PlaceKorea, Republic of
CityJeju Island
Period9/08/2613/08/26
Internet address

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Research Keywords

  • gpu acceleration
  • vector search

Fingerprint

Dive into the research topics of 'Hitcher: Efficient GPU-based Vector Search via Cluster-Centric Kernel and Hitch-Ride Ordering'. Together they form a unique fingerprint.

Cite this