Abstract
In multi-objective decision-making with hierarchical preferences, lexicographic bandits provide a natural framework for optimizing multiple objectives in a prioritized order. In this setting, a learner repeatedly selects arms and observes reward vectors, aiming to maximize the reward for the highest-priority objective, then the next, and so on. While previous studies have primarily focused on regret minimization, this work bridges the gap between regret minimization and best arm identification under lexicographic preferences. We propose two elimination-based algorithms to address this joint objective. The first algorithm eliminates suboptimal arms sequentially, layer by layer, in accordance with the objective priorities, and achieves sample complexity and regret bounds comparable to those of the best single-objective algorithms. The second algorithm simultaneously leverages reward information from all objectives in each round, effectively exploiting cross-objective dependencies. Remarkably, it outperforms the known lower bound for the single-objective bandit problem, highlighting the benefit of cross-objective information sharing in the multi-objective setting. Empirical results further validate their superior performance over baselines. © 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence |
| Editors | Sven Koenig, Chad Jenkins, Matthew E. Taylor |
| Publisher | AAAI Press |
| Pages | 27414-27422 |
| Number of pages | 9 |
| ISBN (Print) | 978-1-57735-906-7 |
| DOIs | |
| Publication status | Published - 2026 |
| Event | 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26) - , Singapore Duration: 20 Jan 2026 → 27 Jan 2026 Conference number: 26 https://aaai.org/conference/aaai/aaai-26/ |
Publication series
| Name | Proceedings of the AAAI Conference on Artificial Intelligence |
|---|---|
| Number | 32 |
| Volume | 40 |
| ISSN (Print) | 2159-5399 |
| ISSN (Electronic) | 2374-3468 |
Conference
| Conference | 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26) |
|---|---|
| Abbreviated title | AAAI-26 |
| Place | Singapore |
| Period | 20/01/26 → 27/01/26 |
| Internet address |
Funding
The work described in this paper was supported by the Research Grants Council of the Hong Kong Special Administrative Region, China [GRF Project No. CityU 11215622].
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'Beyond the Lower Bound: Bridging Regret Minimization and Best Arm Identification in Lexicographic Bandits'. Together they form a unique fingerprint.Projects
- 1 Active
-
GRF: Few for Many: A Non-Pareto Approach for Many Objective Optimization
ZHANG, Q. (Principal Investigator / Project Coordinator)
1/01/23 → …
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver