TY - JOUR
T1 - Boosting Night-time Scene Parsing with Learnable Frequency
AU - Xie, Zhifeng
AU - Wang, Sen
AU - Xu, Ke
AU - Zhang, Zhizhong
AU - Tan, Xin
AU - Xie, Yuan
AU - Ma, Lizhuang
PY - 2023
Y1 - 2023
N2 - Night-Time Scene Parsing (NTSP) is essential to many vision applications, especially for autonomous driving. Most of the existing methods are proposed for day-time scene parsing. They rely on modeling pixel intensity-based spatial contextual cues under even illumination. Hence, these methods do not perform well in night-time scenes as such spatial contextual cues are buried in the over-/under-exposed regions in night-time scenes. In this paper, we first conduct an image frequency-based statistical experiment to interpret the day-time and night-time scene discrepancies. We find that image frequency distributions differ significantly between day-time and night-time scenes, and understanding such frequency distributions is critical to NTSP problem. Based on this, we propose to exploit the image frequency distributions for night-time scene parsing. First, we propose a Learnable Frequency Encoder (LFE) to model the relationship between different frequency coefficients to measure all frequency components dynamically. Second, we propose a Spatial Frequency Fusion module (SFF) that fuses both spatial and frequency information to guide the extraction of spatial context features. Extensive experiments show that our method performs favorably against the state-of-the-art methods on the NightCity, NightCity+ and BDD100K-night datasets. In addition, we demonstrate that our method can be applied to existing day-time scene parsing methods and boost their performance on night-time scenes. The code is available at https://github.com/wangsen99/FDLNet. © 2023 IEEE.
AB - Night-Time Scene Parsing (NTSP) is essential to many vision applications, especially for autonomous driving. Most of the existing methods are proposed for day-time scene parsing. They rely on modeling pixel intensity-based spatial contextual cues under even illumination. Hence, these methods do not perform well in night-time scenes as such spatial contextual cues are buried in the over-/under-exposed regions in night-time scenes. In this paper, we first conduct an image frequency-based statistical experiment to interpret the day-time and night-time scene discrepancies. We find that image frequency distributions differ significantly between day-time and night-time scenes, and understanding such frequency distributions is critical to NTSP problem. Based on this, we propose to exploit the image frequency distributions for night-time scene parsing. First, we propose a Learnable Frequency Encoder (LFE) to model the relationship between different frequency coefficients to measure all frequency components dynamically. Second, we propose a Spatial Frequency Fusion module (SFF) that fuses both spatial and frequency information to guide the extraction of spatial context features. Extensive experiments show that our method performs favorably against the state-of-the-art methods on the NightCity, NightCity+ and BDD100K-night datasets. In addition, we demonstrate that our method can be applied to existing day-time scene parsing methods and boost their performance on night-time scenes. The code is available at https://github.com/wangsen99/FDLNet. © 2023 IEEE.
KW - Context modeling
KW - Frequency Analysis
KW - Frequency conversion
KW - Image coding
KW - Image segmentation
KW - Night-time Vision
KW - Scene Parsing
KW - Spectrogram
KW - Time-frequency analysis
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85153799508&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85153799508&origin=recordpage
U2 - 10.1109/TIP.2023.3267044
DO - 10.1109/TIP.2023.3267044
M3 - RGC 21 - Publication in refereed journal
C2 - 37071518
SN - 1057-7149
VL - 32
SP - 2386
EP - 2398
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -