Combining CNN and transformers for full-reference and no-reference image quality assessment

Chao Zeng, Sam Kwong*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

16 Citations (Scopus)

Abstract

Most deep learning approaches for image quality assessment use regression from deep features extracted by CNN (Convolutional Neural Networks). However, non-local information is usually neglected in existing methods. Motivated by the recent success of transformers in modeling contextual information, we propose a hybrid framework that utilizes a vision transformer backbone to extract features and a CNN decoder for quality estimation. We propose a shared feature extraction scheme for both FR and NR settings. A two-branch structured attentive quality predictor is devised for quality prediction. Evaluation experiments on various IQA datasets, including LIVE, CSIQ and TID2013, LIVE-Challenge, KADID-10 K, and KONIQ-10 K, show that our proposed models achieve outstanding performance for both FR and NR settings. © 2023 Published by Elsevier B.V.
Original languageEnglish
Article number126437
JournalNeurocomputing
Volume549
Online published13 Jun 2023
DOIs
Publication statusPublished - 7 Sept 2023

Research Keywords

  • Convolutional neural network
  • Image quality assessment
  • Non-local information
  • Transformers

Fingerprint

Dive into the research topics of 'Combining CNN and transformers for full-reference and no-reference image quality assessment'. Together they form a unique fingerprint.

Cite this