Skip to main navigation Skip to search Skip to main content

HomLLM: Exploiting Semantic Homology Relationship for Fine-Grained Bird Image Classification via Large Language Models

  • Hai Liu
  • , Yu Song*
  • , Tingting Liu*
  • , Hao Zheng
  • , Lin Chen
  • , Zhaoli Zhang*
  • , You-Fu Li
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

How to recognize endangered bird species in complex outdoor environments has attracted considerable attention in the fields of computer vision and machine learning. However, fine-grained bird image classification (FBIC) is susceptible to problems such as arbitrary postures, interclass discriminability, and occlusions. We propose a novel semantic homology relationship representation learning for fine-grained bird classification with large language models, namely HomLLM, to address these challenges in FBIC effectively. Our proposed model aims to learn homology relationship representations adaptively by identifying invariant structural correspondences between visual features and semantic descriptions, using limited bird data and base class labels. Our approach yields two key findings: 1) invariant homology in key regions of birds that maintain structural consistency across different postures and 2) homological relationship that establish essential taxonomic markers among similar bird classes. Based on these insights, we propose two new modules of the model: the semantic homology generation (SHG) module and homology relationship mining (HRM) module. Specifically, in SHG, bird features are described at multiple granularities through a large language model (LLM) to establish semantic homology. In HRM, feature adaptation is performed separately for textual and visual information, and cross-modal homological interaction is performed hierarchically. In addition, we propose a hierarchical homology interaction scheme to integrate multilevel homological features while preserving structural consistency. Experiments on the commonly used bird datasets CUB-200-2011 and NABirds demonstrate that HomLLM exhibits better performance than state-of-the-art (SOTA) methods.

© 2025 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission.
Original languageEnglish
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
Online published28 Oct 2025
DOIs
Publication statusOnline published - 28 Oct 2025

Funding

This orkw asw supported in part by the National Natural Science Foundation of China under Grant 62577020, Grant 62573369, Grant 62477024, Grant 62377037, Grant 62277041, Grant 62173286, Grant 62177019, and Grant 62177018; in part by Jiangxi Provincial Natural Science Foundation under Grant 20252BAC220007, Grant 20252BAC240201, Grant 20242BAB2S107, and Grant 20232BAB212026; in part by the National Natural Science Foun- dation of Hubei Province under Grant 2025AFD621; in part by Shenzhen Science and Ty Program under Grant JCYJ20250604185710014 and Grant JCYJ20230807152900001; in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2025A1515010266; in part by the Fundamental Research Funds for the Central Usvn under Grant CCNU25ai012 and Grant CCNU25XJ003, and in part by the Science and Ty Research Program of Hubei Provincial Department of Education under Grant B2024301 and Grant 2025DHWL002.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 15 - Life on Land
    SDG 15 Life on Land

Research Keywords

  • Classification
  • homology relationship
  • image understanding
  • large language models (LLMs)

Fingerprint

Dive into the research topics of 'HomLLM: Exploiting Semantic Homology Relationship for Fine-Grained Bird Image Classification via Large Language Models'. Together they form a unique fingerprint.

Cite this