Multi-objective Meta-return Reinforcement Learning for Sequential Recommendation
Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45) › 32_Refereed conference paper (with ISBN/ISSN) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | Artificial Intelligence |
Subtitle of host publication | Second CAAI International Conference, CICAI 2022, Beijing, China, August 27–28, 2022, Revised Selected Papers, Part II |
Editors | Lu Fang, Daniel Povey, Guangtao Zhai, Ruiping Wang |
Place of Publication | Cham |
Publisher | Springer |
Pages | 95-111 |
Volume | Part II |
ISBN (Electronic) | 978-3-031-20500-2 |
ISBN (Print) | 9783031204999 |
Publication status | Published - 2 Jan 2023 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Volume | 13605 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Title | 2nd CAAI International Conference on Artificial Intelligence (CICAI 2022) |
---|---|
Place | China |
City | Beijing |
Period | 27 - 28 August 2022 |
Link(s)
Abstract
With the demand for information filtering among big data, reinforcement learning (RL) that considers the long-term effects of sequential interactions is attracting much attention in the sequential recommendation realm. Many RL models have shown promising results on sequential recommendation; however, these methods have two major issues. First, they always apply the conventional exponential decaying summation for return calculation in the recommendation. Second, most of them are designed to optimize a single objective on the current reward or use simple scalar addition to combine heterogeneous rewards (e.g., Click Through Rate [CTR] or Browsing Depth [BD]) in the recommendation. In real-world recommender systems, we often need to simultaneously maximize multiple objectives (e.g., both CTR and BD), for which some objectives are prone to long-term effect (i.e., BD) and others focus on current effect (i.e., CTR), leading to trade-offs during optimization. To address these challenges, we propose a Multi-Objective Meta-return Reinforcement Learning (M2OR-RL) framework for sequential recommendation, which consists of a meta-return network and a multi-objective gating network. Specifically, the meta-return network is designed to adaptively capture the return of each action in an objective, while the multi-objective gating network coordinates trade-offs among multiple objectives. Extensive experiments are conducted on an online e-commence recommendation dataset and two benchmark datasets and have shown the superior performance of our approach. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Citation Format(s)
Multi-objective Meta-return Reinforcement Learning for Sequential Recommendation. / Yu, Yemin; Kuang, Kun; Yang, Jiangchao et al.
Artificial Intelligence: Second CAAI International Conference, CICAI 2022, Beijing, China, August 27–28, 2022, Revised Selected Papers, Part II. ed. / Lu Fang; Daniel Povey; Guangtao Zhai; Ruiping Wang. Vol. Part II Cham : Springer , 2023. p. 95-111 (Lecture Notes in Computer Science; Vol. 13605).Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45) › 32_Refereed conference paper (with ISBN/ISSN) › peer-review