Skip to main navigation Skip to search Skip to main content

Beyond the Lower Bound: Bridging Regret Minimization and Best Arm Identification in Lexicographic Bandits

Bo Xue, Yuanyu Wan, Zhichao Lu, Qingfu Zhang*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

In multi-objective decision-making with hierarchical preferences, lexicographic bandits provide a natural framework for optimizing multiple objectives in a prioritized order. In this setting, a learner repeatedly selects arms and observes reward vectors, aiming to maximize the reward for the highest-priority objective, then the next, and so on. While previous studies have primarily focused on regret minimization, this work bridges the gap between regret minimization and best arm identification under lexicographic preferences. We propose two elimination-based algorithms to address this joint objective. The first algorithm eliminates suboptimal arms sequentially, layer by layer, in accordance with the objective priorities, and achieves sample complexity and regret bounds comparable to those of the best single-objective algorithms. The second algorithm simultaneously leverages reward information from all objectives in each round, effectively exploiting cross-objective dependencies. Remarkably, it outperforms the known lower bound for the single-objective bandit problem, highlighting the benefit of cross-objective information sharing in the multi-objective setting. Empirical results further validate their superior performance over baselines. © 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Original languageEnglish
Title of host publicationProceedings of the 40th Annual AAAI Conference on Artificial Intelligence
EditorsSven Koenig, Chad Jenkins, Matthew E. Taylor
PublisherAAAI Press
Pages27414-27422
Number of pages9
ISBN (Print)978-1-57735-906-7
DOIs
Publication statusPublished - 2026
Event40th Annual AAAI Conference on Artificial Intelligence (AAAI-26) - , Singapore
Duration: 20 Jan 202627 Jan 2026
Conference number: 26
https://aaai.org/conference/aaai/aaai-26/

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
Number32
Volume40
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference40th Annual AAAI Conference on Artificial Intelligence (AAAI-26)
Abbreviated titleAAAI-26
PlaceSingapore
Period20/01/2627/01/26
Internet address

Funding

The work described in this paper was supported by the Research Grants Council of the Hong Kong Special Administrative Region, China [GRF Project No. CityU 11215622].

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Beyond the Lower Bound: Bridging Regret Minimization and Best Arm Identification in Lexicographic Bandits'. Together they form a unique fingerprint.

Cite this