Monocular 3D Object Detection with Motion Feature Distillation
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 82933-82945 |
Journal / Publication | IEEE Access |
Volume | 11 |
Online published | 1 Aug 2023 |
Publication status | Published - 2023 |
Link(s)
DOI | DOI |
---|---|
Attachment(s) | Documents
Publisher's Copyright Statement
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85166747570&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(07ade758-bc78-406d-a564-5d9fd084d52b).html |
Abstract
In the context of autonomous driving, environmental perception within a 360-degree field of view is extremely important. This can be achieved via the detection of three-dimensional (3D) objects in the surrounding scene with the inputs acquired by sensors such as LiDAR or RGB camera. The 3D perception generated is commonly represented as the bird’s-eye-view (BEV) of the sensor. RGB camera has the advantages of low-cost and long-range acquisition. As the RGB images are two-dimensional (2D), the BEV generated from 2D images suffers from low accuracy due to limitations such as lack of temporal correlation. To address the problems, we propose a monocular 3D object detection method based on long short-term feature fusion and motion feature distillation. Long short-term temporal features are extracted with different feature map resolutions. The motion features and depth information are combined and encoded using an encoder based on the Transformer cross-correlation module, and further integrated into the BEV space of fused long short-term temporal features. Subsequently, a decoder with motion feature distillation is used to localize objects in 3D space. By combining BEV feature representations of different time steps, and supplemented with embedded motion features and depth information, our proposed method significantly improves the accuracy of monocular 3D object detection as demonstrated from experimental results obtained on nuScenes dataset. Our proposed method outperforms state-of-the-art methods, in particular the previous best art by 6.7% on mAP, and 8.3% on mATE.
Research Area(s)
- 3D object detection, autonomous driving, bird’s-eye-view (BEV), Estimation, Feature extraction, Image resolution, knowledge distillation, Location awareness, monocular depth estimation, motion feature, Object detection, Solid modeling, Three-dimensional displays
Citation Format(s)
Monocular 3D Object Detection with Motion Feature Distillation. / HU, Henan; LI, Muyu; ZHU, Ming et al.
In: IEEE Access, Vol. 11, 2023, p. 82933-82945.
In: IEEE Access, Vol. 11, 2023, p. 82933-82945.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Download Statistics
No data available