SAM-Event-Adapter: Adapting Segment Anything Model for Event-RGB Semantic Segmentation

Bowen Yao, Yongjian Deng*, Yuhan Liu, Hao Chen, Youfu Li, Zhen Yang

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

6 Citations (Scopus)

Abstract

Semantic segmentation, a fundamental visual task ubiquitously employed in sectors ranging from transportation and robotics to healthcare, has always captivated the research community. In the wake of rapid advancements in large model research, the foundation model for semantic segmentation tasks, termed the Segment Anything Model (SAM), has been introduced. This model substantially addresses the dilemma of poor generalizability of previous segmentation models and the disadvantage in requiring to retrain the whole model on variant datasets. Nonetheless, segmentation models developed on SAM remain constrained by the inherent limitations of RGB sensors, particularly in scenarios characterized by complex lighting conditions and high-speed motion. Motivated by these observations, a natural recourse is to adapt SAM to additional visual modalities without compromising its robust generalizability. To achieve this, we introduce a lightweight SAM-Event-Adapter (SE-Adapter) module, which incorporates event camera data into a cross-modal learning architecture based on SAM, with only limited tunable parameters incremental. Capitalizing on the high dynamic range and temporal resolution afforded by event cameras, our proposed multi-modal Event-RGB learning architecture effectively augments the performance of semantic segmentation tasks. In addition, we propose a novel paradigm for representing event data in a patch format compatible with transformer-based models, employing multi-spatiotemporal scale encoding to efficiently extract motion and semantic correlations from event representations. Exhaustive empirical evaluations conducted on the DSEC-Semantic and DDD17 datasets provide validation of the effectiveness and rationality of our proposed approach. © 2024 IEEE.
Original languageEnglish
Title of host publication2024 IEEE International Conference on Robotics and Automation (ICRA)
PublisherIEEE
Pages9093-9100
ISBN (Electronic)979-8-3503-8457-4
DOIs
Publication statusPublished - 2024
Event2024 IEEE International Conference on Robotics and Automation (ICRA 2024): CONNECT+ - Yokohama, Japan
Duration: 13 May 202417 May 2024
https://2024.ieee-icra.org/

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)1050-4729

Conference

Conference2024 IEEE International Conference on Robotics and Automation (ICRA 2024)
Abbreviated titleICRA2024
PlaceJapan
CityYokohama
Period13/05/2417/05/24
Internet address

Funding

This work is partially supported by National Key R&D Program of China (No. 2022YFB3103100), the National Natural Science Foundation of China (62203024, 92167102, 61873220, 62102083, 62173286, 61875068, 62177018), the Natural Science Foundation of Jiangsu Province (BK20210222), the R&D Program of Beijing Municipal Education Commission (KM202310005027), the Research Grants Council of Hong Kong (CityU 11213420).

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'SAM-Event-Adapter: Adapting Segment Anything Model for Event-RGB Semantic Segmentation'. Together they form a unique fingerprint.

Cite this