TY - GEN
T1 - RHPENet
T2 - 8th International Conference on Artificial Intelligence and Big Data (ICAIBD 2025)
AU - Liu, Tingting
AU - Qian, Shijia
AU - Liu, Hai
AU - Wang, Minhong
AU - Yang, Bing
AU - Li, You-Fu
PY - 2025
Y1 - 2025
N2 - Head pose estimation (HPE) techniques frequently encounter difficulties when handling extreme angles, occlusions, and uneven lighting conditions. In this paper, we present a novel heterogeneous relationship learning framework designed to mitigate these limitations by exploiting facial regions of interest (FRoI) and their complex interdependencies. Our approach stems from two fundamental discoveries: first, the critical importance of FRoI for pose determination, and second, the heterogeneous relationship between neighboring postures. The proposed architecture consists of three main modules: region feature generator (RFG), hierarchical structural representation (HSR), and cross-relation aggregator (CRA). The RFG incorporates an adaptive attention mechanism that prioritizes diagnostically significant facial zones. Within the HSR component, we implement a novel "Rugby-style"cross-level connectivity pattern to enhance feature integration. The CRA employs Transformer-based techniques to uncover both spatial and angular dependencies. Comprehensive evaluations conducted on major HPE benchmarks (300W_LP, AFLW2000, and BIWI) demonstrate that our RHPENet model consistently outperforms existing approaches. © 2025 IEEE.
AB - Head pose estimation (HPE) techniques frequently encounter difficulties when handling extreme angles, occlusions, and uneven lighting conditions. In this paper, we present a novel heterogeneous relationship learning framework designed to mitigate these limitations by exploiting facial regions of interest (FRoI) and their complex interdependencies. Our approach stems from two fundamental discoveries: first, the critical importance of FRoI for pose determination, and second, the heterogeneous relationship between neighboring postures. The proposed architecture consists of three main modules: region feature generator (RFG), hierarchical structural representation (HSR), and cross-relation aggregator (CRA). The RFG incorporates an adaptive attention mechanism that prioritizes diagnostically significant facial zones. Within the HSR component, we implement a novel "Rugby-style"cross-level connectivity pattern to enhance feature integration. The CRA employs Transformer-based techniques to uncover both spatial and angular dependencies. Comprehensive evaluations conducted on major HPE benchmarks (300W_LP, AFLW2000, and BIWI) demonstrate that our RHPENet model consistently outperforms existing approaches. © 2025 IEEE.
KW - Facial regions of interest
KW - Head pose estimation
KW - Heterogeneous relationship
KW - Robot vision
KW - Transformer
UR - https://www.scopus.com/pages/publications/105012753598
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-105012753598&origin=recordpage
U2 - 10.1109/ICAIBD64986.2025.11081970
DO - 10.1109/ICAIBD64986.2025.11081970
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 979-8-3315-1937-7
T3 - International Conference on Artificial Intelligence and Big Data, ICAIBD
SP - 639
EP - 645
BT - 2025 8th International Conference on Artificial Intelligence and Big Data (ICAIBD 2025)
PB - IEEE
Y2 - 23 May 2025 through 26 May 2025
ER -