Projects per year
Abstract
Semantic segmentation, a fundamental visual task ubiquitously employed in sectors ranging from transportation and robotics to healthcare, has always captivated the research community. In the wake of rapid advancements in large model research, the foundation model for semantic segmentation tasks, termed the Segment Anything Model (SAM), has been introduced. This model substantially addresses the dilemma of poor generalizability of previous segmentation models and the disadvantage in requiring to retrain the whole model on variant datasets. Nonetheless, segmentation models developed on SAM remain constrained by the inherent limitations of RGB sensors, particularly in scenarios characterized by complex lighting conditions and high-speed motion. Motivated by these observations, a natural recourse is to adapt SAM to additional visual modalities without compromising its robust generalizability. To achieve this, we introduce a lightweight SAM-Event-Adapter (SE-Adapter) module, which incorporates event camera data into a cross-modal learning architecture based on SAM, with only limited tunable parameters incremental. Capitalizing on the high dynamic range and temporal resolution afforded by event cameras, our proposed multi-modal Event-RGB learning architecture effectively augments the performance of semantic segmentation tasks. In addition, we propose a novel paradigm for representing event data in a patch format compatible with transformer-based models, employing multi-spatiotemporal scale encoding to efficiently extract motion and semantic correlations from event representations. Exhaustive empirical evaluations conducted on the DSEC-Semantic and DDD17 datasets provide validation of the effectiveness and rationality of our proposed approach. © 2024 IEEE.
| Original language | English |
|---|---|
| Title of host publication | 2024 IEEE International Conference on Robotics and Automation (ICRA) |
| Publisher | IEEE |
| Pages | 9093-9100 |
| ISBN (Electronic) | 979-8-3503-8457-4 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | 2024 IEEE International Conference on Robotics and Automation (ICRA 2024): CONNECT+ - Yokohama, Japan Duration: 13 May 2024 → 17 May 2024 https://2024.ieee-icra.org/ |
Publication series
| Name | Proceedings - IEEE International Conference on Robotics and Automation |
|---|---|
| ISSN (Print) | 1050-4729 |
Conference
| Conference | 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) |
|---|---|
| Abbreviated title | ICRA2024 |
| Place | Japan |
| City | Yokohama |
| Period | 13/05/24 → 17/05/24 |
| Internet address |
Funding
This work is partially supported by National Key R&D Program of China (No. 2022YFB3103100), the National Natural Science Foundation of China (62203024, 92167102, 61873220, 62102083, 62173286, 61875068, 62177018), the Natural Science Foundation of Jiangsu Province (BK20210222), the R&D Program of Beijing Municipal Education Commission (KM202310005027), the Research Grants Council of Hong Kong (CityU 11213420).
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'SAM-Event-Adapter: Adapting Segment Anything Model for Event-RGB Semantic Segmentation'. Together they form a unique fingerprint.Projects
- 1 Finished
-
GRF: Gaze Tracking and its Integration with Human-Robot Cooperation
LI, Y. F. (Principal Investigator / Project Coordinator) & CHEN, H. (Co-Investigator)
1/01/21 → 24/06/25
Project: Research