Skip to main navigation Skip to search Skip to main content

Empowering Large Language Model for Sequential Recommendation via Multimodal Embeddings and Semantic IDs

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Sequential recommendation (SR) aims to capture users' dynamic interests and sequential patterns based on their historical interactions. Recently, the powerful capabilities of large language models (LLMs) have driven their adoption in SR. However, we identify two critical challenges in existing LLM-based SR methods: 1) embedding collapse when incorporating pre-trained collaborative embeddings and 2) catastrophic forgetting of quantized embeddings when utilizing semantic IDs. These issues dampen the model scalability and lead to suboptimal recommendation performance. Therefore, based on LLMs like Llama3-8B-instruct, we introduce a novel SR framework named MME-SID, which integrates multimodal embeddings and quantized embeddings to mitigate embedding collapse. Additionally, we propose a Multimodal Residual Quantized Variational Autoencoder (MM-RQ-VAE) with maximum mean discrepancy as the reconstruction loss and contrastive learning for alignment, which effectively preserve intra-modal distance information and capture inter-modal correlations, respectively. To further alleviate catastrophic forgetting, we initialize the model with the trained multimodal code embeddings. Finally, we fine-tune the LLM efficiently using LoRA in a multimodal frequency-aware fusion manner. Extensive experiments on three public datasets validate the superior performance of MME-SID thanks to its capability to mitigate embedding collapse and catastrophic forgetting. The implementation code and datasets are publicly available for reproduction: https://github.com/Applied-Machine-Learning-Lab/MME-SID. © 2025 Copyright held by the owner/author(s).
Original languageEnglish
Title of host publicationCIKM '25
Subtitle of host publicationProceedings of the 34th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages3209-3219
Number of pages11
ISBN (Print)979-8-4007-2040-6
DOIs
Publication statusPublished - Nov 2025
Event34th ACM International Conference on Information and Knowledge Management (CIKM 2025) - COEX, Seoul, Korea, Republic of
Duration: 10 Nov 202514 Nov 2025
https://cikm2025.org/

Publication series

NameCIKM - Proceedings of the ACM International Conference on Information and Knowledge Management

Conference

Conference34th ACM International Conference on Information and Knowledge Management (CIKM 2025)
Abbreviated titleCIKM '25
PlaceKorea, Republic of
CitySeoul
Period10/11/2514/11/25
Internet address

Funding

This research was partially supported by Hong Kong Research Grants Council s Research Impact Fund (No.R1015-23), Collaborative Research Fund (No.C1043-24GF), General Research Fund (No.11218325), Institute of Digital Medicine of City University of Hong Kong (No.9229503), Tencent (CCF-Tencent Open Fund, Tencent Rhino-Bird Focused Research Program), and National Natural Science Foundation of China (No.62502404).

Research Keywords

  • large language model
  • multimodal recommendation
  • recommender system
  • semantic ids
  • sequential recommendation

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Empowering Large Language Model for Sequential Recommendation via Multimodal Embeddings and Semantic IDs'. Together they form a unique fingerprint.

Cite this