TY - JOUR
T1 - Perception-Oriented U-Shaped Transformer Network for 360-Degree No-Reference Image Quality Assessment
AU - Zhou, Mingliang
AU - Chen, Lei
AU - Wei, Xuekai
AU - Liao, Xingran
AU - Mao, Qin
AU - Wang, Heqiang
AU - Pu, Huayan
AU - Luo, Jun
AU - Xiang, Tao
AU - Fang, Bin
PY - 2023/6
Y1 - 2023/6
N2 - Generally, 360-degree images have absolute senses of reality and three-dimensionality, providing a wide range of immersive interactions. Due to the novel rendering and display technology of 360-degree images, they have more complex perceptual characteristics than other images. It is challenging to perform comprehensive image quality assessment (IQA) learning by simply stacking multichannel neural network architectures for pre/postprocessing, compression, and rendering tasks. To thoroughly learn the global and local features in 360-degree images, reduce the complexity of multichannel neural network models and simplify the training process, this paper proposes a joint architecture with user perception and an efficient transformer dedicated to 360-degree no-reference (NR) IQA. The input of the proposed method is a 360-degree cube map projection (CMP) image. Furthermore, the proposed 360-degree NRIQA method includes a saliency map-based non-overlapping self-attention selection module and a U-shaped transformer (U-former)-based feature extraction module to account for perceptual region importance and projection distortion. The transformer-based architecture and the weighted average technique are jointly utilized for predicting local perceptual quality. Experimental results obtained on widely used databases show that the proposed model outperforms other state-of-the-art methods in NR 360-degree image quality evaluation cases. Furthermore, a cross-database evaluation and an ablation study also demonstrate the inherent robustness and generalization ability of the proposed model.
AB - Generally, 360-degree images have absolute senses of reality and three-dimensionality, providing a wide range of immersive interactions. Due to the novel rendering and display technology of 360-degree images, they have more complex perceptual characteristics than other images. It is challenging to perform comprehensive image quality assessment (IQA) learning by simply stacking multichannel neural network architectures for pre/postprocessing, compression, and rendering tasks. To thoroughly learn the global and local features in 360-degree images, reduce the complexity of multichannel neural network models and simplify the training process, this paper proposes a joint architecture with user perception and an efficient transformer dedicated to 360-degree no-reference (NR) IQA. The input of the proposed method is a 360-degree cube map projection (CMP) image. Furthermore, the proposed 360-degree NRIQA method includes a saliency map-based non-overlapping self-attention selection module and a U-shaped transformer (U-former)-based feature extraction module to account for perceptual region importance and projection distortion. The transformer-based architecture and the weighted average technique are jointly utilized for predicting local perceptual quality. Experimental results obtained on widely used databases show that the proposed model outperforms other state-of-the-art methods in NR 360-degree image quality evaluation cases. Furthermore, a cross-database evaluation and an ablation study also demonstrate the inherent robustness and generalization ability of the proposed model.
KW - Image quality assessment
KW - no-reference image quality assessment
KW - 360-degree image
KW - U-shaped transformer
KW - OMNIDIRECTIONAL IMAGE
KW - SALIENCY
KW - CNN
UR - http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=LinksAMR&SrcApp=PARTNER_APP&DestLinkType=FullRecord&DestApp=WOS&KeyUT=000910529000001
U2 - 10.1109/TBC.2022.3231101
DO - 10.1109/TBC.2022.3231101
M3 - RGC 21 - Publication in refereed journal
SN - 0018-9316
VL - 69
SP - 396
EP - 405
JO - IEEE Transactions on Broadcasting
JF - IEEE Transactions on Broadcasting
IS - 2
ER -