Skip to main navigation Skip to search Skip to main content

Offline Learning for Combinatorial Multi-armed Bandits

Xutong Liu, Xiangxiang Dai*, Jinhang Zuo*, Siwei Wang, Carlee Joe-Wong, John C. S. Lui, Wei Chen*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making framework, extensively studied over the past decade. However, existing work primarily focuses on the online setting, overlooking the substantial costs of online interactions and the readily available offline datasets. To overcome these limitations, we introduce Off-CMAB, the first offline learning framework for CMAB. Central to our framework is the combinatorial lower confidence bound (CLCB) algorithm, which combines pessimistic reward estimations with combinatorial solvers. To characterize the quality of offline datasets, we propose two novel data coverage conditions and prove that, under these conditions, CLCB achieves a near-optimal suboptimality gap, matching the theoretical lower bound up to a logarithmic factor. We validate Off-CMAB through practical applications, including learning to rank, large language model (LLM) caching, and social influence maximization, showing its ability to handle nonlinear reward functions, general feedback models, and out-of-distribution action samples that exclude optimal or even feasible actions. Extensive experiments on synthetic and real-world datasets for these applications further highlight the superior performance of CLCB. © 2025, ML Research Press. All rights reserved.
Original languageEnglish
Title of host publicationProceedings of the 42nd International Conference on Machine Learning
EditorsAarti Singh, Maryam Fazel, Daniel Hsu
PublisherML Research Press
Pages38251-38289
Publication statusPublished - Jul 2025
Event42nd International Conference on Machine Learning (ICML 2025) - Vancouver Convention Center, Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025
https://icml.cc/Conferences/2025

Publication series

NameProceedings of Machine Learning Research
Volume267
ISSN (Electronic)2640-3498

Conference

Conference42nd International Conference on Machine Learning (ICML 2025)
Abbreviated titleICML 2025
PlaceCanada
CityVancouver
Period13/07/2519/07/25
Internet address

Funding

The work is supported by NSF CNS-2103024 and the Office of Naval Research under grant N000142412073. The work of John C.S. Lui was supported in part by the RGC GRF14202923. The work of Jinhang Zuo was supported by CityUHK 9610706.

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Offline Learning for Combinatorial Multi-armed Bandits'. Together they form a unique fingerprint.

Cite this