Abstract
Semantic scene understanding is a fundamental capability for autonomous vehicles. Under challenging lighting conditions, such as nighttime and on-coming headlights, the semantic scene understanding performance using only RGB images are usually degraded. Thermal images can provide complementary information to RGB images, so many recent semantic segmentation networks have been proposed using RGB-Thermal (RGB-T) images. However, most existing networks focus only on improving segmentation accuracy for single image frames, omitting the information consistency between consecutive frames. To provide a solution to this issue, we propose a temporal-consistent framework for RGB-T semantic segmentation, which introduces a virtual view image generation module to synthesize a virtual image for the next moment, and a consistency loss function to ensure the segmentation consistency. We also propose an evaluation metric to measure both the accuracy and consistency for semantic segmentation. Experimental results show that our framework outperforms state-of-the-art methods. © 2024 IEEE.
Original language | English |
---|---|
Pages (from-to) | 9757-9764 |
Journal | IEEE Robotics and Automation Letters |
Volume | 9 |
Issue number | 11 |
Online published | 10 Sept 2024 |
DOIs | |
Publication status | Published - Nov 2024 |
Funding
This work was supported in part by Hong Kong Innovation and Technology Fund under Grant ITS/145/21, and in part by the City University of Hong Kong under Grant 9610675.
Research Keywords
- Autonomous Vehicles
- Multi-modal Fusion
- RGB-Thermal
- Semantic Segmentation
- Temporal Consistency