Projects per year
Abstract
Distortions from spatial and temporal domains have been identified as the dominant factors that govern the visual quality. Though both have been studied independently in deep learning-based user-generated content (UGC) video quality assessment (VQA) by frame-wise distortion estimation and temporal quality aggregation, much less work has been dedicated to the integration of them with deep representations. In this paper, we propose a SpatioTemporal Interactive VQA (STI-VQA) model based upon the philosophy that video distortion can be inferred from the integration of both spatial characteristics and temporal motion, along with the flow of time. In particular, for each timestamp, both the spatial distortion explored by the feature statistics and local motion captured by feature difference are extracted and fed to a transformer network for the motion aware interaction learning. Meanwhile, the information flow of spatial distortion from the shallow layer to the deep layer is constructed adaptively during the temporal aggregation. The transformer network enjoys an advanced advantage for long-range dependencies modeling, leading to superior performance on UGC videos. Experimental results on five UGC video benchmarks demonstrate the effectiveness and efficiency of our STI-VQA model, and the source code will be available online at https://github.com/h4nwei/STI-VQA.
| Original language | English |
|---|---|
| Pages (from-to) | 1031-1042 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| Volume | 33 |
| Issue number | 3 |
| Online published | 21 Sept 2022 |
| DOIs | |
| Publication status | Published - Mar 2023 |
Funding
This work was supported in part by the National Natural Science Foundation of China under 62022002; in part by the Shenzhen Virtural University Park, The Science Technology and Innovation Committee of Shenzhen Municipality, under Project 2021Szvup128; and in part by the Hong Kong Research Grants Council General Research Fund (GRF) under Grant 11203220.
Research Keywords
- Distortion
- Feature extraction
- No-reference video quality assessment
- Quality assessment
- Spatiotemporal phenomena
- Three-dimensional displays
- Transformers
- user-generated content
- Video recording
- vision transformer
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'Learning Spatiotemporal Interactions for User-Generated Video Quality Assessment'. Together they form a unique fingerprint.Projects
- 1 Finished
-
GRF: Towards Smart Visual Sensor Data Representation with Intelligent Sensing in the Internet of Video Things
WANG, S. (Principal Investigator / Project Coordinator), Huang, T. (Co-Investigator) & XUE, C. J. (Co-Investigator)
1/01/21 → 23/06/25
Project: Research