SqueezExpNet : Dual-stage convolutional neural network for accurate facial expression recognition with attention mechanism

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

3 Scopus Citations
View graph of relations

Related Research Unit(s)


Original languageEnglish
Article number110451
Journal / PublicationKnowledge-Based Systems
Online published25 Mar 2023
Publication statusPublished - 7 Jun 2023


Facial expression recognition (FER) using a deep convolutional neural network (DCNN) is important and challenging. Although a substantial effort is made to increase FER accuracy through DCNN, previous studies are still not sufficiently generalisable for real-world applications. Traditional FER studies are mainly limited to controlled lab-posed frontal facial images, which lack the challenges of motion blur, head poses, occlusions, face deformations and lighting under uncontrolled conditions. In this work, we proposed a SqueezExpNet architecture that can take advantage of local and global facial information for a highly accurate FER system that can handle environmental variations. Our network was divided into two stages: a geometrical attention stage that possesses a SqueezeNet-like architecture to obtain local highlight information and a spatial texture stage comprising several squeezed and expanded layers to exploit high-level global features. In particular, we created a weighted mask of 3D face landmarks and used element-wise multiplication with a spatial feature in the first stage to draw attention to important local facial regions. Next, we input the face spatial image and its augmentations into the second stage of the network. Finally, like a classifier, a recurrent neural network was designed to collaborate the highlighted information from dual stages rather than simply using the SoftMax function, thereby aiding in overcoming the uncertainties. Experiments covering basic and compound FER tasks were performed using the three leading facial expression datasets. Our strategy outperformed the existing DCNN methods and achieved state-of-the-art results. The developed architecture, adopted research methodology and reported findings may find potential applications of real-time FER in surveillance, health and feedback systems. © 2023 Elsevier B.V.

Research Area(s)

  • Attention mechanism, Basic expressions, Compound expressions, Deep convolutional neural network, Facial expression recognition, Weighted mask