Abstract
Night-time traffic perception is crucial for the intelligent transportation system (ITS) applications such as autonomous driving, collision avoidance, and driver assistance. This thesis focuses on mainstream techniques in night-time traffic perception, including low-light image enhancement, vehicle detection, and scene segmentation.Considering the shortcomings of low-light traffic images compared with daytime images, such as low brightness and contrast, high noise levels, and poor visibility, Chapter 2 introduces a self-supervised network (SSN). Notably, SSN can be trained using only low-light traffic images, as paired images are frequently difficult to obtain in real-world traffic scenarios. To address the issue of low visual quality, along with noise and artifacts, we design three branches within SSN: a denoising network for reducing noise and artifacts, an enhancement network for dynamically adjusting brightness and color contrast, and an artifact removal network for further improving image quality and mitigating compression effects. Furthermore, to make SSN trainable without paired images, several carefully designed loss functions are proposed. Extensive experiments validate SSN’s effectiveness compared to other low-light enhancement methods. Further, the benefits of SSN are also demonstrated through improved performance in ITS tasks such as vehicle detection on low-light images enhanced by SSN and other methods.
Chapter 3 focuses on the task of vehicle detection under low-light conditions. To address the prevalent issues of existing deep learning-based object detection models in night-time vehicle detection, such as missed detection and misclassification, we introduce a hierarchical contextual information (HCI) framework for precise vehicle detection in Chapter 3. HCI can be employed as a plug-and-play component to enhance existing deep learning-based object detection models, comprising an estimation branch, a segmentation branch, and a detection branch. It is designed to extract hierarchical contextual clues and effectively integrate them for precisely detecting vehicles under challenging night-time conditions. Each module operates at different levels, including image level, pixel level, and object level, to ensure that results from each module are complementary and mutually beneficial for accurately recognizing and localizing night-time vehicles. Experiments on the large-scale night-time vehicle detection dataset demonstrate the flexibility and generalization capabilities of our HCI framework.
To further improve the precision and robustness of night-time vehicle detection, Chapter 4 proposes a detection scheme that leverages the structure of visual pathways along with a contrastive learning strategy (VPCL) for vehicle detection under complex environments. VPCL can effectively boost the detection of challenging night-time vehicles, such as small ones or those that belong to tail classes. In particular, we propose a bio-inspired backbone network, incorporating a double-opponent block based on the color-opponent mechanism in human vision. This design can boost the perception of primary patterns, such as textures, boundaries, and contrast. In addition, to solve the interference caused by various light sources, a novel contrastive learning approach is developed based on the properties of night-time vehicles. Furthermore, to address the imbalance among different vehicle categories, we design a tree structure-based detection head that performs classification in two stages, effectively distinguishing vehicles between head and tail categories. Experimental results on three night-time vehicle detection datasets prove the excellence of VPCL over other night-time vehicle detection methods, including the method presented in Chapter 3.
In Chapter 5, we focus on semantic segmentation for urban scenes under night-time conditions. To solve the problem of insufficient semantic information caused by poor illumination, we propose to explore the fusion of multi-modal information, including visible images and thermal infrared (TIR) images, for precise segmentation. Specifically, we design a dual-graph reasoning-based fusion network (DGRFNet) that integrates RGB and TIR data from a multi-view perspective. By transforming RGB and TIR features from coordinate space into graph space, we can model the semantic relationships among different objects and therefore effectively fuse multi-modal features. In addition, we implement a feature interaction module and multi-perception decoders to enhance the network's capability for feature representation, which can then facilitate the learning of subtle details. Experimental analysis demonstrates the effectiveness of DGRFNet over other state-of-the-art RGB-thermal segmentation approaches for the semantic segmentation of urban scenes under night-time conditions.
In this thesis, we systematically investigate and discuss the challenges of night-time traffic perception and then propose corresponding solutions based on the characteristics of night-time images. In the future, we will design more adaptive and efficient techniques to boost night-time traffic perception in complex and dynamically changing environments.
| Date of Award | 21 Jul 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | L H Leanne CHAN (Supervisor) |
Cite this
- Standard