Skip to main navigation Skip to search Skip to main content

Online Multi-LLM Selection via Contextual Bandits Under Unstructured Context Evolution

  • Manhin Poon
  • , Xiangxiang Dai
  • , Xutong Liu
  • , Fang Kong
  • , John C.S. Lui
  • , Jinhang Zuo*
  • *Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Large language models (LLMs) exhibit diverse response behaviors, costs, and strengths, making it challenging to select the most suitable LLM for a given user query. We study the problem of adaptive multi-LLM selection in an online setting, where the learner interacts with users through multi-step query refinement and must choose LLMs sequentially without access to offline datasets or model internals. A key challenge arises from unstructured context evolution: the prompt dynamically changes in response to previous model outputs via a black-box process, which cannot be simulated, modeled, or learned. To address this, we propose the first contextual bandit framework for sequential LLM selection under unstructured prompt dynamics. We formalize a notion of myopic regret and develop a LinUCB-based algorithm that provably achieves sublinear regret without relying on future context prediction. We further introduce budget-aware and positionally-aware (fa-voring early-stage satisfaction) extensions to accommodate variable query costs and user preferences for early high-quality responses. Our algorithms are theoretically grounded and require no offline fine-tuning or dataset-specific training. Experiments on diverse benchmarks demonstrate that our methods outperform existing LLM routing strategies in both accuracy and cost-efficiency, validating the power of contextual bandits for real-time, adaptive LLM selection. © 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Original languageEnglish
Title of host publicationProceedings of the 40th Annual AAAI Conference on Artificial Intelligence
PublisherAAAI Press
Pages24855-24863
Number of pages9
Volume40
ISBN (Print)1-57735-906-2, 978-1-57735-906-7
DOIs
Publication statusPublished - 2026
Event40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026) - Singapore EXPO, Singapore
Duration: 20 Jan 202627 Jan 2026
https://aaai.org/conference/aaai/aaai-26/

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
Volume40
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026)
Abbreviated titleAAAI-26
PlaceSingapore
Period20/01/2627/01/26
Internet address

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Funding

The work of Jinhang Zuo was supported by CityUHK 9610706. The work of Xiangxiang Dai was supported by the National Natural Science Foundation of China (625B2163). The work of Fang Kong was supported by the Guangdong Basic and Applied Basic Research Foundation 2025A1515011412. The work of John C.S. Lui was supported in part by the RGC GRF-14215722.

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Online Multi-LLM Selection via Contextual Bandits Under Unstructured Context Evolution'. Together they form a unique fingerprint.

Cite this