Multi-Scale correlation module for video-based facial expression recognition in the wild
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 109691 |
Journal / Publication | Pattern Recognition |
Volume | 142 |
Online published | 13 May 2023 |
Publication status | Published - Oct 2023 |
Link(s)
Abstract
The detection of facial muscle movements (e.g., mouth opening) is crucial for facial expression recognition (FER). However, extracting these facial motion features is challenging for a deep-learning recognition system for the following reasons: (1) without explicit labels of motion for training, there is no guarantee that convolutional neural networks (CNNs) can extract motions effectively; (2) compared to human action recognition (e.g., the object moving from left to right), some facial motions (e.g., raising eyebrows) are more subtle and thus harder to extract; and (3) the use of optical flow to extract motion features is time-consuming when using a commonly-used camera. In this work, we propose a Multi-Scale Correlation Module (MSCM) together with an adaptive fusion. Firstly, large as well as small facial motions are extracted by MSCM and encoded by CNNs. Then, an adaptive fusion module is used to aggregate motion features. With these modules, our recognition network is able to model both subtle and large motion features for video-based FER with only the RGB image frames as input. Experiments on two datasets, AFEW and DFEW, show that the network achieves state-of-art performances on the benchmarks. © 2023 Elsevier Ltd. All rights reserved.
Research Area(s)
- Adaptive fusion, Convolutional neural networks, Facial expression recognition, Motion estimation
Bibliographic Note
Citation Format(s)
In: Pattern Recognition, Vol. 142, 109691, 10.2023.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review