Abstract
The Multi-Armed Bandit (MAB) model is crucial for algorithms that allocate limited resources in machine learning. It is especially important in scenarios where testing and application occur simultaneously. Initially, the MAB model was used in medical trials. The MAB framework emphasizes balancing exploration and exploitation. In real-world applications like clinical trials and online advertising, the focus on testing has highlighted the Best Arm Identification (BAI) problem, which is a pure exploration problem in MAB. BAI aims to find the best option (arm) from a limited number of arms. Developing an efficient strategy is essential to optimize the use of a limited budget. It also helps to improve recommendation accuracy. Evaluating the convergence properties of algorithms is crucial, since it helps in judging their practical effectiveness.Given the need to study algorithms with optimal convergence properties, we examine two efficient sampling techniques for the BAI problem: the Improved Knowledge Gradient (iKG) framework and the Top Two framework. The iKG framework is based on the Knowledge Gradient (KG) algorithm and is completely new for BAI problems.
The iKG framework is inspired by the KG algorithm for the BAI problem. The KG algorithm chooses the arm that gives the greatest expected one-step improvement in the estimate of the best mean. However, this policy has limitations, making the algorithm not asymptotically optimal. To fix this, we propose the iKG framework. It uses the KG strategy of looking one step ahead but selects the arm that improves the probability of finding the best arm. This approach shows asymptotic optimality. To test the iKG framework's generality, we extend it to the ε-good arm identification and feasible arm identification problems. Here, the target arms are not just the single best arm but based on how the target arms are defined. The ε-good arm identification finds all arms whose mean reward is within a given ε of the best arm's. Feasible arm identification seeks all arms meeting or falling below a specific threshold. We develop algorithms iKG-ε and iKG-F for these problems and prove their asymptotic optimality. Thus, the iKG framework adapts better to different problems than the KG algorithm. We also test the three algorithms with synthetic and real-world examples to show their superior performance.
The Top Two framework adapts Thompson Sampling (TS) for the BAI challenge in MAB. It enhances non-optimal algorithms to achieve optimality. The framework identifies two candidates, a leader and a challenger, and randomly chooses one in each round. This helps explore sub-optimal arms that might be ignored if only rewards are maximized. Algorithms like Top-Two Thompson Sampling (TTTS), Top-Two Expected Improvement (TTEI), and Top-Two Probability Sampling (TTPS) show the effectiveness of this framework. In many real-world cases, the reward from pulling an arm is a random vector, not a scalar. Here, the agent aims to find the arm with the highest mean reward, considering specific constraints. This is called the best feasible arm identification (BFAI). We will apply the Top Two framework to this less explored problem. We propose three algorithms for BFAI within the Top Two framework. The first two use TS and Expected Improvement (EI) concepts, modified for feasibility under constraints, named BFAI-TS and BFAI-EI. The third, BFAI-FS, selects arms based on a decomposition of the probability of false selection for the best feasible arm. We prove the asymptotic optimality of these algorithms in terms of sample allocation and rates of posterior convergence. Our analysis reveals that these algorithms eventually produce the same sample allocation. We also find the optimal sample proportion for the best arm, solving a question for Top Two algorithms. Additionally, numerical experiments show the superior performance of these three algorithms.
In practical applications, an agent's budget is sometimes not fixed. Instead, algorithms are needed that recommend the correct arm with the fewest samples while meeting accuracy requirements. This provides an optimal balance between efficiency and cost. Therefore, studying the complexity of the BFAI problem under a fixed confidence framework is crucial. In this thesis, we characterize the complexity for any stopping rule applied to BFAI and establish a tight lower bound on the sample complexity when observations follow the exponential family distribution. We also explore the upper bounds of the BFAI-TS, BFAI-EI, and BFAI-FS algorithms and show that these algorithms are asymptotically optimal with the Chernoff stopping rule.
| Date of Award | 5 Aug 2024 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Siyang GAO (Supervisor) |