TY - JOUR
T1 - PhyTrans
T2 - Learning Phylogenetic Relationships for FBIC via Hierarchical Taxonomy Representation
AU - Liu, Hai
AU - Huang, Xinyi
AU - Liu, Tingting
AU - Liu, Zhibing
AU - Shen, Dazhen
AU - Zhang, Zhaoli
AU - Li, You-Fu
PY - 2026/2/20
Y1 - 2026/2/20
N2 - How to accurately identify endangered bird species in complex natural environments has become an important research topic jointly concerned by the computer vision and biological conservation communities. However, they remain limited in systematically modeling cross-species semantic similarity and effectively exploiting structural stability under pose variations, making robust discrimination in highly similar species scenarios difficult. To address these challenges, we propose PhyTrans, a phylogeny-driven fine-grained bird recognition framework that achieves unified representation learning by jointly modeling inter-species phylogenetic relationships and intra-image skeletal invariance across different poses. Specifically, a phylogenetic token construction (PTC) module is designed to leverage hierarchical taxonomic information, ranging from class to species, and embed phylogenetic relationships into a hyperbolic space, which preserves hierarchical semantic distances while explicitly modeling appearance similarity induced by evolutionary relatedness. Building upon this, phylogenetic representations and intra-image skeletal structural cues are further integrated within a unified Transformer architecture through the proposed phylogenetic relationship mining (PRM) module, enabling collaborative modeling of cross-species similarity and structural invariance. Extensive experiments on the CUB-200-2011 and NABirds datasets demonstrate that PhyTrans outperforms state-of-the-art approaches, validating the critical role of phylogenetic relationships in advancing ecological visual recognition. © 1991-2012 IEEE.
AB - How to accurately identify endangered bird species in complex natural environments has become an important research topic jointly concerned by the computer vision and biological conservation communities. However, they remain limited in systematically modeling cross-species semantic similarity and effectively exploiting structural stability under pose variations, making robust discrimination in highly similar species scenarios difficult. To address these challenges, we propose PhyTrans, a phylogeny-driven fine-grained bird recognition framework that achieves unified representation learning by jointly modeling inter-species phylogenetic relationships and intra-image skeletal invariance across different poses. Specifically, a phylogenetic token construction (PTC) module is designed to leverage hierarchical taxonomic information, ranging from class to species, and embed phylogenetic relationships into a hyperbolic space, which preserves hierarchical semantic distances while explicitly modeling appearance similarity induced by evolutionary relatedness. Building upon this, phylogenetic representations and intra-image skeletal structural cues are further integrated within a unified Transformer architecture through the proposed phylogenetic relationship mining (PRM) module, enabling collaborative modeling of cross-species similarity and structural invariance. Extensive experiments on the CUB-200-2011 and NABirds datasets demonstrate that PhyTrans outperforms state-of-the-art approaches, validating the critical role of phylogenetic relationships in advancing ecological visual recognition. © 1991-2012 IEEE.
KW - Computer vision
KW - FBIC
KW - Hierarchical taxonomy
KW - Phylogenetic relationships
KW - Transformer
UR - https://www.scopus.com/pages/publications/105030977835
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-105030977835&origin=recordpage
U2 - 10.1109/TCSVT.2026.3666530
DO - 10.1109/TCSVT.2026.3666530
M3 - RGC 21 - Publication in refereed journal
SN - 1051-8215
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -