TY - JOUR
T1 - HomLLM
T2 - Exploiting Semantic Homology Relationship for Fine-Grained Bird Image Classification via Large Language Models
AU - Liu, Hai
AU - Song, Yu
AU - Liu, Tingting
AU - Zheng, Hao
AU - Chen, Lin
AU - Zhang, Zhaoli
AU - Li, You-Fu
PY - 2025/10/28
Y1 - 2025/10/28
N2 - How to recognize endangered bird species in complex outdoor environments has attracted considerable attention in the fields of computer vision and machine learning. However, fine-grained bird image classification (FBIC) is susceptible to problems such as arbitrary postures, interclass discriminability, and occlusions. We propose a novel semantic homology relationship representation learning for fine-grained bird classification with large language models, namely HomLLM, to address these challenges in FBIC effectively. Our proposed model aims to learn homology relationship representations adaptively by identifying invariant structural correspondences between visual features and semantic descriptions, using limited bird data and base class labels. Our approach yields two key findings: 1) invariant homology in key regions of birds that maintain structural consistency across different postures and 2) homological relationship that establish essential taxonomic markers among similar bird classes. Based on these insights, we propose two new modules of the model: the semantic homology generation (SHG) module and homology relationship mining (HRM) module. Specifically, in SHG, bird features are described at multiple granularities through a large language model (LLM) to establish semantic homology. In HRM, feature adaptation is performed separately for textual and visual information, and cross-modal homological interaction is performed hierarchically. In addition, we propose a hierarchical homology interaction scheme to integrate multilevel homological features while preserving structural consistency. Experiments on the commonly used bird datasets CUB-200-2011 and NABirds demonstrate that HomLLM exhibits better performance than state-of-the-art (SOTA) methods.
© 2025 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission.
AB - How to recognize endangered bird species in complex outdoor environments has attracted considerable attention in the fields of computer vision and machine learning. However, fine-grained bird image classification (FBIC) is susceptible to problems such as arbitrary postures, interclass discriminability, and occlusions. We propose a novel semantic homology relationship representation learning for fine-grained bird classification with large language models, namely HomLLM, to address these challenges in FBIC effectively. Our proposed model aims to learn homology relationship representations adaptively by identifying invariant structural correspondences between visual features and semantic descriptions, using limited bird data and base class labels. Our approach yields two key findings: 1) invariant homology in key regions of birds that maintain structural consistency across different postures and 2) homological relationship that establish essential taxonomic markers among similar bird classes. Based on these insights, we propose two new modules of the model: the semantic homology generation (SHG) module and homology relationship mining (HRM) module. Specifically, in SHG, bird features are described at multiple granularities through a large language model (LLM) to establish semantic homology. In HRM, feature adaptation is performed separately for textual and visual information, and cross-modal homological interaction is performed hierarchically. In addition, we propose a hierarchical homology interaction scheme to integrate multilevel homological features while preserving structural consistency. Experiments on the commonly used bird datasets CUB-200-2011 and NABirds demonstrate that HomLLM exhibits better performance than state-of-the-art (SOTA) methods.
© 2025 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission.
KW - Classification
KW - homology relationship
KW - image understanding
KW - large language models (LLMs)
UR - https://www.scopus.com/pages/publications/105020413439
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-105020413439&origin=recordpage
U2 - 10.1109/TNNLS.2025.3617339
DO - 10.1109/TNNLS.2025.3617339
M3 - RGC 21 - Publication in refereed journal
SN - 2162-237X
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
ER -