Frame-wise Detection of Double HEVC Compression by Learning Deep Spatio-temporal Representations in Compression Domain

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

6 Scopus Citations
View graph of relations

Author(s)

  • Peisong He
  • Hongxia Wang
  • Xinghao Jiang
  • Ruimei Zhang

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)3179-3192
Journal / PublicationIEEE Transactions on Multimedia
Volume23
Publication statusOnline published - 2 Sep 2020

Abstract

Detection of double compression is regarded as one primary step in analyzing the integrity of digital videos, which is of prominent importance in video forensics. However, current methods are vulnerable with the severe lossy quantization in the recompression process such that it is challenging to obtain reliable frame-wise detection results, especially for the high efficiency video coding (HEVC) standard. In view of these issues, in this paper, a hybrid neural network is proposed to reveal abnormal frames in HEVC videos with double compression by learning robust spatio-temporal representations from coding information in the compression domain. Based on the statistical analysis of Coding Units (CUs), it is interesting to find that HEVC video streams contain "rich" coding information that could be leveraged to identify abnormal traces caused by double compression. Two types of coding information maps, including CU Size Map (CSM) and CU Prediction mode Map (CPM), are exploited. In contrast with the conventional paradigm relying on pixel-level representations of decoded frames, CSMs and CPMs of a short-time video clip are treated as the input, aiming to achieve high robustness against recompression of low quality. In our hybrid neural network, an attention-based two-stream residual network is proposed to learn hierarchical representations from CSM and CPM, which are then jointly optimized by the attention-based fusion module. Finally, the temporal variation is modeled by Long Short-Term Memory (LSTM) to obtain frame-wise detection results. We have conducted extensive experiments considering various video content and coding parameters, such as bitrates and sizes of Group of Picture. Experimental results show that our approach can obtain state-of-the-art performance compared with conventional methods, especially when videos are recompressed in the low bitrate coding scenarios.

Research Area(s)

  • coding information map, double HEVC compression, hybrid neural network, spatio-temporal representation, Video forensics

Citation Format(s)