Projects per year
Abstract
In this paper, we propose a novel framework for Interactive Face Video Coding (IFVC), which allows humans to interact with the intrinsic visual representations instead of the signals. The proposed solution enjoys several distinct advantages, including ultra-compact representation, low delay interaction, and vivid expression/headpose animation. In particular, we propose the Internal Dimension Increase (IDI) based representation, greatly enhancing the fidelity and flexibility in rendering the appearance while maintaining reasonable representation cost. By leveraging strong statistical regularities, the visual signals can be effectively projected into controllable semantics in the three dimensional space (e.g., mouth motion, eye blinking, head rotation, head translation and head location), which are compressed and transmitted. The editable bitstream, which naturally supports the interactivity at the semantic level, can synthesize the face frames via the strong inference ability of the deep generative model. Experimental results have demonstrated the performance superiority and application prospects of our proposed IFVC scheme. In particular, the proposed scheme not only outperforms the state-of-the-art video coding standard Versatile Video Coding (VVC) and the latest generative compression schemes in terms of rate-distortion performance for face videos, but also enables the interactive coding without introducing additional manipulation processes. Furthermore, the proposed framework is expected to shed lights on the future design of the digital human communication in the metaverse. © 1992-2012 IEEE.
Original language | English |
---|---|
Pages (from-to) | 2910-2925 |
Journal | IEEE Transactions on Image Processing |
Volume | 34 |
Online published | 12 May 2025 |
DOIs | |
Publication status | Published - 2025 |
Funding
This work was supported in part by Shenzhen Science and Technology Program under Project JCYJ20220530140816037 and in part by the Research Grants Council (RGC) General Research Fund under Grant 11200323 and Grant 11203220.
Research Keywords
- controllable embedding
- face video
- Interactive video coding
Fingerprint
Dive into the research topics of 'Interactive Face Video Coding: A Generative Compression Framework'. Together they form a unique fingerprint.-
GRF: Semantic Visual Data Compression for Vehicular Communications in Intelligent Driving Systems
WANG, S. (Principal Investigator / Project Coordinator) & WU, D. (Co-Investigator)
1/01/24 → …
Project: Research
-
GRF: Towards Smart Visual Sensor Data Representation with Intelligent Sensing in the Internet of Video Things
WANG, S. (Principal Investigator / Project Coordinator), Huang, T. (Co-Investigator) & XUE, C. J. (Co-Investigator)
1/01/21 → 23/06/25
Project: Research