Abstract
The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making framework, extensively studied over the past decade. However, existing work primarily focuses on the online setting, overlooking the substantial costs of online interactions and the readily available offline datasets. To overcome these limitations, we introduce Off-CMAB, the first offline learning framework for CMAB. Central to our framework is the combinatorial lower confidence bound (CLCB) algorithm, which combines pessimistic reward estimations with combinatorial solvers. To characterize the quality of offline datasets, we propose two novel data coverage conditions and prove that, under these conditions, CLCB achieves a near-optimal suboptimality gap, matching the theoretical lower bound up to a logarithmic factor. We validate Off-CMAB through practical applications, including learning to rank, large language model (LLM) caching, and social influence maximization, showing its ability to handle nonlinear reward functions, general feedback models, and out-of-distribution action samples that exclude optimal or even feasible actions. Extensive experiments on synthetic and real-world datasets for these applications further highlight the superior performance of CLCB. © 2025, ML Research Press. All rights reserved.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 42nd International Conference on Machine Learning |
| Editors | Aarti Singh, Maryam Fazel, Daniel Hsu |
| Publisher | ML Research Press |
| Pages | 38251-38289 |
| Publication status | Published - Jul 2025 |
| Event | 42nd International Conference on Machine Learning (ICML 2025) - Vancouver Convention Center, Vancouver, Canada Duration: 13 Jul 2025 → 19 Jul 2025 https://icml.cc/Conferences/2025 |
Publication series
| Name | Proceedings of Machine Learning Research |
|---|---|
| Volume | 267 |
| ISSN (Electronic) | 2640-3498 |
Conference
| Conference | 42nd International Conference on Machine Learning (ICML 2025) |
|---|---|
| Abbreviated title | ICML 2025 |
| Place | Canada |
| City | Vancouver |
| Period | 13/07/25 → 19/07/25 |
| Internet address |
Funding
The work is supported by NSF CNS-2103024 and the Office of Naval Research under grant N000142412073. The work of John C.S. Lui was supported in part by the RGC GRF14202923. The work of Jinhang Zuo was supported by CityUHK 9610706.
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'Offline Learning for Combinatorial Multi-armed Bandits'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver