ECSNet : Spatio-Temporal Feature Learning for Event Camera

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

View graph of relations


  • Zhiwen Chen
  • Jinjian Wu
  • Leida Li
  • Weisheng Dong
  • Guangming Shi

Related Research Unit(s)


Original languageEnglish
Number of pages12
Journal / PublicationIEEE Transactions on Circuits and Systems for Video Technology
Publication statusOnline published - 29 Aug 2022


The neuromorphic event cameras can efficiently sense the latent geometric structures and motion clues of a scene by generating asynchronous and sparse event signals. Due to the irregular layout of the event signals, how to leverage their plentiful spatio-temporal information for recognition tasks remains a significant challenge. Existing methods tend to treat events as dense image-like or point-serie representations. However, they either suffer from severe destruction on the sparsity of event data or fail to encode robust spatial cues. To fully exploit their inherent sparsity with reconciling the spatio-temporal information, we introduce a compact event representation, namely 2D-1T event cloud sequence (2D-1T ECS). We couple this representation with a novel light-weight spatio-temporal learning framework (ECSNet) that accommodates both object classification and action recognition tasks. The core of our framework is a hierarchical spatial relation module. Equipped with specially designed surface-event-based sampling unit and local event normalization unit to enhance the inter-event relation encoding, this module learns robust geometric features from the 2D event clouds. And we propose a motion attention module for efficiently capturing long-term temporal context evolving with the 1T cloud sequence. Empirically, the experiments show that our framework achieves par or even better state-of-the-art performance. Importantly, our approach cooperates well with the sparsity of event data without any sophisticated operations, hence leading to low computational costs and prominent inference speeds.

Research Area(s)

  • Event Camera, Spatio-temporal Feature Learning, Object Classification, Action Recognition

Bibliographic Note

Research Unit(s) information for this publication is provided by the author(s) concerned.