Skip to main navigation Skip to search Skip to main content

A disentangled multimodal neural topic model

Yingqiu Xiong, Yezheng Liu, Yang Qian*, Yuanchun Jiang, Yidong Chai, Haifeng Ling

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

This study focuses on multimodal topic modeling and attempts to separate public topics (shared across modalities) from private topics (unique to each modality) hidden in text and image data. To address this issue, we propose a novel Disentangled Multimodal Neural Topic Model (DMNTM). Specifically, we design the modality-specific encoder with an independence constraint to capture private topics, and the public encoder with a product-of-experts module to extract cross-modal shared topics. We conduct extensive experiments on six public datasets, including multimodal online reviews from Amazon, posts from Flickr, tweets from Twitter, and webpages from Wikipedia. Compared with state-of-the-art methods, we find that DMNTM significantly improves topic modeling performance in terms of perplexity, coherence, diversity, and topic quality over the best baseline. In two downstream tasks, including recommendation and sentiment classification, DMNTM further improves the performance. These results show that disentangling public and private topics effectively enhances both the quality and utility of multimodal representations. © 2026 Elsevier Ltd.
Original languageEnglish
Article number104683
Number of pages16
JournalInformation Processing & Management
Volume63
Issue number5
Online published17 Feb 2026
DOIs
Publication statusOnline published - 17 Feb 2026
Externally publishedYes

Funding

This work is supported by the National Natural Science Foundation of China (72342011, 72101072, 72271084, 72171071, 72571096, 72322019), the Fundamental Research Funds for the Central Universities (JZ2024HGTG0316,PA2023IISL0103), and the National Engineering Laboratory for Big Data Distribution and Exchange Technologies.

Research Keywords

  • Multimodal analysis
  • Neural topic modeling
  • Disentanglement learning
  • Public and private topics
  • Deep learning

Fingerprint

Dive into the research topics of 'A disentangled multimodal neural topic model'. Together they form a unique fingerprint.

Cite this