Multi-Scale correlation module for video-based facial expression recognition in the wild

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

1 Scopus Citations
View graph of relations


Related Research Unit(s)


Original languageEnglish
Article number109691
Journal / PublicationPattern Recognition
Online published13 May 2023
Publication statusPublished - Oct 2023


The detection of facial muscle movements (e.g., mouth opening) is crucial for facial expression recognition (FER). However, extracting these facial motion features is challenging for a deep-learning recognition system for the following reasons: (1) without explicit labels of motion for training, there is no guarantee that convolutional neural networks (CNNs) can extract motions effectively; (2) compared to human action recognition (e.g., the object moving from left to right), some facial motions (e.g., raising eyebrows) are more subtle and thus harder to extract; and (3) the use of optical flow to extract motion features is time-consuming when using a commonly-used camera. In this work, we propose a Multi-Scale Correlation Module (MSCM) together with an adaptive fusion. Firstly, large as well as small facial motions are extracted by MSCM and encoded by CNNs. Then, an adaptive fusion module is used to aggregate motion features. With these modules, our recognition network is able to model both subtle and large motion features for video-based FER with only the RGB image frames as input. Experiments on two datasets, AFEW and DFEW, show that the network achieves state-of-art performances on the benchmarks. © 2023 Elsevier Ltd. All rights reserved.

Research Area(s)

  • Adaptive fusion, Convolutional neural networks, Facial expression recognition, Motion estimation

Bibliographic Note

Publisher Copyright: © 2023 Elsevier Ltd