Multimodal Named Entity Recognition based on topic prompt and multi-curriculum denoising

Mingying Xu, Kui Peng, Jie Liu*, Qing Zhang, Linqi Song, Yinqiao Li

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Citation (Scopus)

Abstract

The rapid development of Generative Large Models (GLMs) such as ChatGPT, GPT4 have significantly enhanced their ability to handle complex tasks and drive innovation across multiple fields, especially in social media field. However, GLMs are prone to generate “hallucinated” content when dealing with ambiguous problems lacking clear evidence, which undermines their reliability. Multimodal Named Entity Recognition (MNER) addresses this issue by integrating image, text and contextual information to establish a fact-based framework, thereby reducing the risk of hallucination and strengthening the reasoning foundation of GLMs. The combination of GLMs and MNER merges the flexibility of content generation with evidence-based constraints, thereby improving reliability and interpretability. In MNER task, weakly related or irrelevant image information introduces noise, which degrades MNER performance. In this paper, we propose a novel framework TPMCLNet, which combines topic prompt with a multi-curriculum denoising strategy. First, the topic prompt module extracts topic information from the images and integrates this image-derived information with the text as auxiliary input, thereby enhancing the model's understanding of multimodal data. This is particularly useful in cases where the correlation between the image and text is weak, as it provides additional semantic cues to help the model more accurately identify named entities. Additionally, we employ a denoising strategy based on multi-curriculum learning, which defines noise metrics at different granularities to progressively optimize the presentation order of the training data, reducing the impact of noise on the model. Within this framework, we conduct a comprehensive noise assessment of both images and text, gradually introducing cleaner data to improve model training. Experimental results show that, by combining topic prompt with multi-curriculum denoising strategies, TPMCLNet significantly improves MNER performance in complex multimodal environments, demonstrating its effectiveness. © 2025
Original languageEnglish
Article number103405
JournalInformation Fusion
Volume124
Online published21 Jun 2025
DOIs
Publication statusPublished - Dec 2025

Funding

This work is supported by Joint Fund Key Program of the National Natural Science Foundation of China ( U23B2029 ), National Natural Science Foundation of China ( 62076167 ), Yuxiu Innovation Project of NCUT ( 2024NCUTYXCX102 ), National Natural Science Foundation of China ( 62372277 ), Natural Science Foundation of Shandong Province, China ( ZR2022MF257 ).

Research Keywords

  • Generative Large Models
  • Multi-curriculum denoising
  • Multimodal fusion
  • Multimodal Named Entity Recognition
  • Topic prompt

Fingerprint

Dive into the research topics of 'Multimodal Named Entity Recognition based on topic prompt and multi-curriculum denoising'. Together they form a unique fingerprint.

Cite this