Abstract
The rapid development of Generative Large Models (GLMs) such as ChatGPT, GPT4 have significantly enhanced their ability to handle complex tasks and drive innovation across multiple fields, especially in social media field. However, GLMs are prone to generate “hallucinated” content when dealing with ambiguous problems lacking clear evidence, which undermines their reliability. Multimodal Named Entity Recognition (MNER) addresses this issue by integrating image, text and contextual information to establish a fact-based framework, thereby reducing the risk of hallucination and strengthening the reasoning foundation of GLMs. The combination of GLMs and MNER merges the flexibility of content generation with evidence-based constraints, thereby improving reliability and interpretability. In MNER task, weakly related or irrelevant image information introduces noise, which degrades MNER performance. In this paper, we propose a novel framework TPMCLNet, which combines topic prompt with a multi-curriculum denoising strategy. First, the topic prompt module extracts topic information from the images and integrates this image-derived information with the text as auxiliary input, thereby enhancing the model's understanding of multimodal data. This is particularly useful in cases where the correlation between the image and text is weak, as it provides additional semantic cues to help the model more accurately identify named entities. Additionally, we employ a denoising strategy based on multi-curriculum learning, which defines noise metrics at different granularities to progressively optimize the presentation order of the training data, reducing the impact of noise on the model. Within this framework, we conduct a comprehensive noise assessment of both images and text, gradually introducing cleaner data to improve model training. Experimental results show that, by combining topic prompt with multi-curriculum denoising strategies, TPMCLNet significantly improves MNER performance in complex multimodal environments, demonstrating its effectiveness. © 2025
| Original language | English |
|---|---|
| Article number | 103405 |
| Journal | Information Fusion |
| Volume | 124 |
| Online published | 21 Jun 2025 |
| DOIs | |
| Publication status | Published - Dec 2025 |
Funding
This work is supported by Joint Fund Key Program of the National Natural Science Foundation of China ( U23B2029 ), National Natural Science Foundation of China ( 62076167 ), Yuxiu Innovation Project of NCUT ( 2024NCUTYXCX102 ), National Natural Science Foundation of China ( 62372277 ), Natural Science Foundation of Shandong Province, China ( ZR2022MF257 ).
Research Keywords
- Generative Large Models
- Multi-curriculum denoising
- Multimodal fusion
- Multimodal Named Entity Recognition
- Topic prompt