TY - JOUR
T1 - Texture Affinity Cue-Aware Relationship Representation via Transformers for Facial Expression Recognition in Affective Robots
AU - Liu, Hai
AU - Liu, Zhibing
AU - Li, Feifei
AU - Liu, Tingting
AU - Zhang, Zhaoli
AU - Xiong, Neal N.
AU - Li, You-Fu
PY - 2026/2/20
Y1 - 2026/2/20
N2 - Automatic facial expression recognition (FER) from facial videos is a key component in enabling machines to understand human emotional states, which is crucial for affective robots designed to be interactive companions and applied in smart healthcare. However, FER is susceptible to challenges such as occlusion, arbitrary orientations, and illumination, making it difficult to implement precise FER models in robots. To address these issues, we propose a texture affinity cues-aware relationship representation method (FTATrans), which learns to associate facial texture with facial expressions in videos. The research reveals two key findings: 1) interaction of facial textures, and 2) texture affinity effects. On this basis, FTATrans mainly consists of two key networks: semantic-information feature generation (SFG) and texture-affinity relationship mining (TAR). In particular, the semantic relationships between different facial regions can be learned through SFG. TAR is used to capture the texture affinity relationship and integrate them with the overall facial expression information. Additionally, a loss function focused on expression-specific texture variations is proposed to guide the model in learning discriminative expression information. Experiments conducted on five video-based FER datasets demonstrate that the FTATrans model achieves state-of-the-art performance. © 2026 IEEE.
AB - Automatic facial expression recognition (FER) from facial videos is a key component in enabling machines to understand human emotional states, which is crucial for affective robots designed to be interactive companions and applied in smart healthcare. However, FER is susceptible to challenges such as occlusion, arbitrary orientations, and illumination, making it difficult to implement precise FER models in robots. To address these issues, we propose a texture affinity cues-aware relationship representation method (FTATrans), which learns to associate facial texture with facial expressions in videos. The research reveals two key findings: 1) interaction of facial textures, and 2) texture affinity effects. On this basis, FTATrans mainly consists of two key networks: semantic-information feature generation (SFG) and texture-affinity relationship mining (TAR). In particular, the semantic relationships between different facial regions can be learned through SFG. TAR is used to capture the texture affinity relationship and integrate them with the overall facial expression information. Additionally, a loss function focused on expression-specific texture variations is proposed to guide the model in learning discriminative expression information. Experiments conducted on five video-based FER datasets demonstrate that the FTATrans model achieves state-of-the-art performance. © 2026 IEEE.
KW - Affective robot
KW - facial affinity field
KW - facial expression recognition (FER)
KW - human–robot interaction
KW - relationship-driven
UR - http://www.scopus.com/inward/record.url?scp=105030862146&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-105030862146&origin=recordpage
U2 - 10.1109/TII.2026.3655379
DO - 10.1109/TII.2026.3655379
M3 - RGC 21 - Publication in refereed journal
SN - 1551-3203
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
ER -