EISNet : A Multi-Modal Fusion Network for Semantic Segmentation with Events and Images
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Journal / Publication | IEEE Transactions on Multimedia |
Publication status | Online published - 21 Mar 2024 |
Link(s)
Abstract
Bio-inspired event cameras record a scene as sparse and asynchronous "events" by detecting per-pixel brightness changes. Such cameras show great potential in challenging scene understanding tasks, benefiting from the imaging advantages of high dynamic range and high temporal resolution. Considering the complementarity between event and standard cameras, we propose a multi-modal fusion network (EISNet) to improve the semantic segmentation performance. The key challenges of this topic lie in (i) how to encode event data to represent accurate scene information and (ii) how to fuse multi-modal complementary features by considering the characteristics of two modalities. To solve the first challenge, we propose an Activity-Aware Event Integration Module (AEIM) to convert event data into frame-based representations with high-confidence details via scene activity modeling. To tackle the second challenge, we introduce the Modality Recalibration and Fusion Module (MRFM) to recalibrate modal-specific representations and then aggregate multi-modal features at multiple stages. MRFM learns to generate modal-oriented masks to guide the merging of complementary features, achieving adaptive fusion. Based on these two core designs, our proposed EISNet adopts an encoder-decoder transformer architecture for accurate semantic segmentation using events and images. Experimental results show that our model outperforms state-of-the-art methods by a large margin on event-based semantic segmentation datasets. The code is publicly available at https://github.com/bochenxie/EISNet.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
Research Area(s)
- Event camera, Multi-modal fusion, Attention mechanism, Semantic segmentation
Citation Format(s)
EISNet: A Multi-Modal Fusion Network for Semantic Segmentation with Events and Images. / Xie, Bochen; Deng, Yongjian; Shao, Zhanpeng et al.
In: IEEE Transactions on Multimedia, 21.03.2024.
In: IEEE Transactions on Multimedia, 21.03.2024.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review