Research progress of six degree of freedom (6DoF) video technology

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

View graph of relations


  • 王旭
  • 刘琼
  • 彭宗举
  • 元辉
  • 赵铁松
  • 秦熠
  • 吴科君
  • 刘文予
  • 杨铀

Related Research Unit(s)


Original languageChinese (Simplified)
Pages (from-to)1863-1890
Journal / Publication中国图象图形学报
Issue number6
Online published14 Apr 2023
Publication statusPublished - Jun 2023


随着元宇宙概念的兴起,以6自由度(six degree of freedom, 6DoF)视频为代表的新一代交互式媒体技术得到产业界和学术界的广泛关注。6DoF视频隶属于多媒体通信领域,通过计算重构的方式向用户提供包括视角、光照、焦距和视场范围等多个维度的媒体交互与内容变化,能使千里之外的用户有身临其境、千人千面之感,与元宇宙具有的感知、计算、重构、协同和交互等技术特征具有高度重合性。因此,6DoF视频涵盖的技术体系可作为实现元宇宙的替代技术框架。本文提出了6DoF视频10个方面的40个问题,并将6DoF视频端到端技术链条归纳为生成、分发和呈现3个宏观阶段,随后围绕这3个技术阶段分别从内容采集与预处理、编码压缩与传输优化以及交互与呈现等方面阐述国内外研究进展。其中,在内容采集与预处理阶段,阐述了多视点联合采集、多视点与深度联合采集、深度图与点云预处理;在视频压缩与传输阶段,阐述了多视点视频编码、多视点+深度视频编码、光场图像压缩、焦栈图像压缩、点云编码压缩、6DoF视频传输优化;在交互与显示阶段,阐述了解碼后滤波增强和虚拟视点合成。最后,本文围绕该领域当下的挑战,对未来趋势进行了讨论。© 中国图象图形学报版权所有
The six degree of freedom based(6DoF-based) video technique is featured by interaction between video content and users, and it is focused on its 1) linear-derived multiple capacities, 2) horizontal straightness, 3) vertical straightness, 4) pitch, 5) yaw, and 6) roll motions of users. In this manner, users can change multiple audio-visual dimensions, including: viewing perspective, lighting condition or directions, focal length or spot, field of view through ground truth-compared computational or synthesized content reconstruction. The 6DoF video can be used to change conventional behavior of video watching, in which the user-video interaction is limited to different span of channels and the relations between video contents is restricted as well. The 6DoF-based technique can offer immersive experience for users because the homogeneity of video-watching receptive content can be in consistency per their motion. In this way, the 6DoF video can be recognized as an epoch-making type of video for academia and industries. At the same time, metaverse-driven 6DoF video has also been recognized as a new generation of interactive media technology, which is recognized as one of the key technologies for Metaversein related domains. All these features make users experience feel depth-immersive and diversified. This mutual-benefited status is in relevance to the metaverse-based perception, computing, reconstruction, collaboration, interaction, and other related technical features. Basically, 6DoF video is originated from the framework of typical multimedia communication system, where it can be suitable to meet the basic procedure requirement of video-contextual multimedia communication like its capturing, content process, video compression, transmission, decode and display. To realize intelligent human-terminal interaction, it brings a new look beyond traditional 3D video communication system, and the requirements for interaction range and intelligence are still acomplicated. Therefore, such newly techniques are in support of new type of video to a certain extent. Our proposed technical framework of 6DoF-relevant multimedia communication system is demonstrated on the three aspects of generation, distribution, and visualization. Forty scientific and technical challenges of this domain are illustrated and it can be categorized them into 10 different directions. We carry out literature review of its growth of per one of these 10 directions on the aspects of content acquisition and pre-processing, coding compression and transmission optimization, interaction, and presentation. For techniques analysis, it is focused on such aspects of 1) content generation-derived multiview video-captured content, 2) multiview video plus depth, and 3) point cloud. The dataacquired systems can be categorized by 2 types of multiview and multiview plus depth system, and different types of contents can be thus obtained via these systems. To describe the 3D structure of the spot scene initially, multiview color videos can be captured without any affiliated information, but it is a challenging issue for subsequent data processing techniques. After that, multiview plus depth system is proposed to handle this problem, while data can be classified into two types of i) color plus depth and ii) point cloud. Data-heteogenous volume is a big challenge for these kinds of data representation to some extent. The video compression techniques-after can be focused on in terms of the video contents. Popular compression techniques for multiview video, multiview video plus depth, light fields, and point clouds are discussed further, including their origination, mechanism, performance, and credible application standards. Subsequently, transmission techniques for 6DoF video are illustrated as well after the video bitstream is obtained. Such techniques like bit allocation, interaction oriented transmission, standards and protocols are all mentioned and discussed. Its quality evaluation and synthesized-view for user-terminal interaction are analyzed as well. It can be reached to user-friendly in terms of a "capture to display" based 6DoF video system. Pixel-based methods are still discussed and optimized but computational cost is challenged there. Recent learning based methods are more concerned about terminal-oriented applications, especially for its synthesized view. To meet the requirements from practical applications, 40 scientific and technical challenges mentioned above are still to be resolved further.

Research Area(s)

  • 元宇宙, 6DoF视频, 内容采集, 编码压缩, 视点合成, metaverse, six degree of freedom (6DoF) video, content capturing, coding compression, view synthesis

Citation Format(s)

6DoF视频技术研究进展. / 王旭; 刘琼; 彭宗举 et al.
In: 中国图象图形学报, Vol. 28, No. 6, 06.2023, p. 1863-1890.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review