Skip to main navigation Skip to search Skip to main content

PhyTrans: Learning Phylogenetic Relationships for FBIC via Hierarchical Taxonomy Representation

  • Hai Liu (Co-first Author)
  • , Xinyi Huang (Co-first Author)
  • , Tingting Liu*
  • , Zhibing Liu
  • , Dazhen Shen
  • , Zhaoli Zhang*
  • , You-Fu Li*
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

How to accurately identify endangered bird species in complex natural environments has become an important research topic jointly concerned by the computer vision and biological conservation communities. However, they remain limited in systematically modeling cross-species semantic similarity and effectively exploiting structural stability under pose variations, making robust discrimination in highly similar species scenarios difficult. To address these challenges, we propose PhyTrans, a phylogeny-driven fine-grained bird recognition framework that achieves unified representation learning by jointly modeling inter-species phylogenetic relationships and intra-image skeletal invariance across different poses. Specifically, a phylogenetic token construction (PTC) module is designed to leverage hierarchical taxonomic information, ranging from class to species, and embed phylogenetic relationships into a hyperbolic space, which preserves hierarchical semantic distances while explicitly modeling appearance similarity induced by evolutionary relatedness. Building upon this, phylogenetic representations and intra-image skeletal structural cues are further integrated within a unified Transformer architecture through the proposed phylogenetic relationship mining (PRM) module, enabling collaborative modeling of cross-species similarity and structural invariance. Extensive experiments on the CUB-200-2011 and NABirds datasets demonstrate that PhyTrans outperforms state-of-the-art approaches, validating the critical role of phylogenetic relationships in advancing ecological visual recognition. © 1991-2012 IEEE.
Original languageEnglish
Number of pages16
JournalIEEE Transactions on Circuits and Systems for Video Technology
DOIs
Publication statusOnline published - 20 Feb 2026

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62577020, Grant 62573369, Grant 62477024, Grant 62377037, Grant 62277041; in part by Jiangxi Provincial Natural Science Foundation under Grant 20252BAC220007, Grant 20252BAC240201, Grant 20242BAB2S107, and Grant 20232BAB212026; in part by the National Natural Science Foundation of Hubei Province under Grant 2025AFD621; in part by Shenzhen Science and Technology Program under Grant JCYJ20250604185710014 and Grant JCYJ20230807152900001; in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2025A1515010266.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 15 - Life on Land
    SDG 15 Life on Land

Research Keywords

  • Computer vision
  • FBIC
  • Hierarchical taxonomy
  • Phylogenetic relationships
  • Transformer

Fingerprint

Dive into the research topics of 'PhyTrans: Learning Phylogenetic Relationships for FBIC via Hierarchical Taxonomy Representation'. Together they form a unique fingerprint.

Cite this