Semantic Visual Data Compression for Vehicular Communications in Intelligent Driving Systems
DescriptionThe visual data play a vital role in perception, planning and control for assistance or automatous driving. To effectively unlock the value of visual data, high efficiency data representation in vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), and vehicle-to-human (V2H) communications are of prominent importance for advanced driving systems. However, the gigantic volume, continuously generated, and extremely unstructured visual data bring grand challenges to vehicle-to-everything (V2X) communications.In intelligent driving systems, the entities that serve as receivers of visual data at the destination have recently experienced a radical shift from human vision to intelligent computing systems. However, in contrast to intensive research work on image/video compression, how to achieve efficient visual representation for ultra-reliable, low latency, high system flexibility and enhanced privacy V2X communication, in an effort to convey visual information that could be directly understood by intelligent agent at the destination, has largely escaped research attention. In this project, we tackle these research questions from the semantic source coding perspective, and aim to develop the semantic visual coding, which serves as enabling technology for intelligent driving systems.We thus propose the semantic source coding paradigm for visual data, paving the way towards the upcoming 6G V2X communications which enable connected intelligence. To allow efficient V2V/I/H communications simultaneously with one-off compression, the framework is intrinsically hierarchical based on the fact that visual scene can be sketched bydifferent layers that are closely correlated. Three layers including the semantic, feature and signal are dedicated to V2V, V2I and V2H communications, respectively. Moreover, the redundancies among three layers are particularly removed with conditional coding, pursuing the representation scalability and efficiency. There are three unique advantages enjoyed by the proposed solution: 1) efficient: the cross-modal graph semantic representation and contextual coding enable extremely compact and privacy assured representation, ensuring the efficiency and reliability in V2X communications; 2) green: the redundant signal decoding then analyticscan be avoided by directly sending the semantic information and visual features for machine understanding; 3) flexible: the encoding is one-pass, generating multiple layers towards multiple receivers, instead of independently encoding upon each request. Our preliminary results show that hierarchical visual representation with three abstraction levels could achieve promising compression performance. The proposed methodology can be feasibly extended beyond image and video to omni-directional and point cloud data. The research results will also advance the frontier of semantic source coding in other areas, such as smart city, remotemedicine and robotics.
|Effective start/end date||1/01/24 → …|