Multi-objective Meta-return Reinforcement Learning for Sequential Recommendation

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)peer-review

View graph of relations

Author(s)

  • Kun Kuang
  • Jiangchao Yang
  • Zeke Wang
  • Kunyang Jia
  • Weiming Lu
  • Hongxia Yang
  • Fei Wu

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationArtificial Intelligence
Subtitle of host publicationSecond CAAI International Conference, CICAI 2022, Beijing, China, August 27–28, 2022, Revised Selected Papers, Part II
EditorsLu Fang, Daniel Povey, Guangtao Zhai, Ruiping Wang
Place of PublicationCham
PublisherSpringer 
Pages95-111
VolumePart II
ISBN (Electronic)978-3-031-20500-2
ISBN (Print)9783031204999
Publication statusPublished - 2 Jan 2023

Publication series

NameLecture Notes in Computer Science
Volume13605
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Title2nd CAAI International Conference on Artificial Intelligence (CICAI 2022)
PlaceChina
CityBeijing
Period27 - 28 August 2022

Abstract

With the demand for information filtering among big data, reinforcement learning (RL) that considers the long-term effects of sequential interactions is attracting much attention in the sequential recommendation realm. Many RL models have shown promising results on sequential recommendation; however, these methods have two major issues. First, they always apply the conventional exponential decaying summation for return calculation in the recommendation. Second, most of them are designed to optimize a single objective on the current reward or use simple scalar addition to combine heterogeneous rewards (e.g., Click Through Rate [CTR] or Browsing Depth [BD]) in the recommendation. In real-world recommender systems, we often need to simultaneously maximize multiple objectives (e.g., both CTR and BD), for which some objectives are prone to long-term effect (i.e., BD) and others focus on current effect (i.e., CTR), leading to trade-offs during optimization. To address these challenges, we propose a Multi-Objective Meta-return Reinforcement Learning (M2OR-RL) framework for sequential recommendation, which consists of a meta-return network and a multi-objective gating network. Specifically, the meta-return network is designed to adaptively capture the return of each action in an objective, while the multi-objective gating network coordinates trade-offs among multiple objectives. Extensive experiments are conducted on an online e-commence recommendation dataset and two benchmark datasets and have shown the superior performance of our approach. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Citation Format(s)

Multi-objective Meta-return Reinforcement Learning for Sequential Recommendation. / Yu, Yemin; Kuang, Kun; Yang, Jiangchao et al.

Artificial Intelligence: Second CAAI International Conference, CICAI 2022, Beijing, China, August 27–28, 2022, Revised Selected Papers, Part II. ed. / Lu Fang; Daniel Povey; Guangtao Zhai; Ruiping Wang. Vol. Part II Cham : Springer , 2023. p. 95-111 (Lecture Notes in Computer Science; Vol. 13605).

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)peer-review