Abstract
Similarity-based vector search, which retrieves the most similar vectors to a given query vector from a large vector dataset, underlies many applications such as search, recommendation, and Large Language Models (LLMs). Some systems run vector search on GPUs to enjoy GPU's high parallelism, but we observe that they are limited in query throughput and latency. In particular, their query-centric GPU kernel conducts computation independently for each query, failing to reuse data loaded to the GPU shared memory across queries and leading to a low GPU compute utilization. While their batch-based task reordering rearranges computation for queries in a batch to reduce CPU-GPU data transfer, but latency is prolonged since each query needs to wait for its slowest task. To tackle these problems, we propose Hitcher. Specifically, to reuse data across queries and improve GPU utilization, Hitcher implements a cluster-centric GPU kernel to batch computation on the same data for multiple queries. To reduce query latency, Hitcher adopts the hitch-ride ordering, which preserves the arrival order for query processing while batching computation across queries to improve efficiency. Hitcher can also offload computation tasks to the CPU to reduce CPU-GPU data transfer and utilize multiple GPUs. Experimental results show that Hitcher achieves up to 22× lower P99 query latency and 9× higher query throughput when compared with the state-of-the-art GPU-based vector query processing systems. © 2026 Owner/Author.
| Original language | English |
|---|---|
| Title of host publication | KDD '26 |
| Subtitle of host publication | Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 |
| Publisher | Association for Computing Machinery |
| Pages | 2066-2075 |
| Number of pages | 10 |
| ISBN (Print) | 979-8-4007-2258-5 |
| DOIs | |
| Publication status | Online published - 20 Apr 2026 |
| Event | 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026) - International Convention Center Jeju (ICC Jeju), Jeju Island, Korea, Republic of Duration: 9 Aug 2026 → 13 Aug 2026 https://kdd2026.kdd.org/ |
Publication series
| Name | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
|---|---|
| Volume | 1-A |
| ISSN (Print) | 2154-817X |
Conference
| Conference | 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026) |
|---|---|
| Abbreviated title | ACM KDD 2026 |
| Place | Korea, Republic of |
| City | Jeju Island |
| Period | 9/08/26 → 13/08/26 |
| Internet address |
Bibliographical note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).Research Keywords
- gpu acceleration
- vector search
Fingerprint
Dive into the research topics of 'Hitcher: Efficient GPU-based Vector Search via Cluster-Centric Kernel and Hitch-Ride Ordering'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver