TY - JOUR
T1 - Dual-perspective hypergraph learning network for multimodal entity and relation extraction
AU - Liu, Jie
AU - Zhong, Hong
AU - Xu, Mingying
AU - Wu, Baowen
AU - Song, Linqi
AU - LI, Yinqiao
AU - Shi, Lei
AU - Kou, Feifei
PY - 2026/3/5
Y1 - 2026/3/5
N2 - Multimodal Named Entity Recognition (MNER) and Relation Extraction (MRE) identify entities and their semantic relationships within paired image-text data. Currently, graph-based approaches have gained significant attention by constructing cross-modal graph to achieve fine-grained alignment and interaction, demonstrating promising performance on MNER and MRE tasks. However, graph-based approaches primarily rely on pairwise interactions, limiting their ability to model complex global dependencies and leading to semantic alignment bias. To address this, we propose a Dual-perspective Hypergraph Learning Network for Multimodal Entity and Relation Extraction (DHGLN) that captures high-order complex correlations among multiple nodes via semantic perspective and contextual-structure perspective. DHGLN adopts attention mechanism and spectral graph convolution to learn semantic level and contextual-structure level hyperedge features to optimize node representation, achieving competitive and robust performance on both MNER and MRE tasks. Experimental results demonstrate significant improvements, with amazing +6.67 % F1-score gains over state-of-the-art baseline on the Twitter-2015 dataset for MNER task. And it also has demonstrated strong performance on the Twitter-2017 dataset for MNER task and the MNRE dataset for MRE task, highlighting the effectiveness and robustness of our approach. © 2025 Published by Elsevier Ltd.
AB - Multimodal Named Entity Recognition (MNER) and Relation Extraction (MRE) identify entities and their semantic relationships within paired image-text data. Currently, graph-based approaches have gained significant attention by constructing cross-modal graph to achieve fine-grained alignment and interaction, demonstrating promising performance on MNER and MRE tasks. However, graph-based approaches primarily rely on pairwise interactions, limiting their ability to model complex global dependencies and leading to semantic alignment bias. To address this, we propose a Dual-perspective Hypergraph Learning Network for Multimodal Entity and Relation Extraction (DHGLN) that captures high-order complex correlations among multiple nodes via semantic perspective and contextual-structure perspective. DHGLN adopts attention mechanism and spectral graph convolution to learn semantic level and contextual-structure level hyperedge features to optimize node representation, achieving competitive and robust performance on both MNER and MRE tasks. Experimental results demonstrate significant improvements, with amazing +6.67 % F1-score gains over state-of-the-art baseline on the Twitter-2015 dataset for MNER task. And it also has demonstrated strong performance on the Twitter-2017 dataset for MNER task and the MNRE dataset for MRE task, highlighting the effectiveness and robustness of our approach. © 2025 Published by Elsevier Ltd.
KW - Multimodal named entity recognition
KW - Multimodal relation extraction
KW - Hypergraph neural network
KW - Hypergraph learning
KW - Multimodal alignment
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:001628268900009
U2 - 10.1016/j.eswa.2025.130290
DO - 10.1016/j.eswa.2025.130290
M3 - RGC 21 - Publication in refereed journal
SN - 0957-4174
VL - 300
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 130290
ER -