Task-Driven Video Compression for Humans and Machines : Framework Design and Optimization

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

4 Scopus Citations
View graph of relations


Related Research Unit(s)


Original languageEnglish
Journal / PublicationIEEE Transactions on Multimedia
Online published30 Dec 2022
Publication statusOnline published - 30 Dec 2022


Learned video compression has developed rapidly and achieved impressive progress in recent years. Despite efficient compression performance, existing signal fidelity oriented or semantic fidelity oriented video compression methods limit the capability to meet the requirements of both machine and human vision. To address this problem, a task-driven video compression framework is proposed to flexibly support vision tasks for both human vision and machine vision. Specifically, to improve the compression performance, the backbone of the video compression framework is optimized by using three novel modules, including multi-scale motion estimation, multi-frame feature fusion, and reference based in-loop filters. Then, based on the proposed efficient compression backbone, a task-driven optimization approach is designed to achieve the trade-off between signal fidelity oriented compression and semantic fidelity oriented compression. Moreover, a post-filter module is employed for the framework to further improve the performance of the human vision branch. Finally, rate-distortion performance, rate-accuracy performance, and subjective quality are employed as the evaluation metrics, and experimental results show the superiority of the proposed framework for both human vision and machine vision. The source code of this work can be found in https://mic.tongji.edu.cn.

Research Area(s)

  • action recognition, Feature extraction, Image coding, Machine vision, multi-task optimization, neural network, Neural networks, Semantics, Task analysis, video coding for machine, Video compression