Learning to Explore Intrinsic Saliency for Stereoscopic Video
Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45) › 32_Refereed conference paper (with ISBN/ISSN) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 9741-9750 |
ISBN (Electronic) | 978-1-7281-3293-8 |
ISBN (Print) | 978-1-7281-3294-5 |
Publication status | Published - Jun 2019 |
Publication series
Name | |
---|---|
ISSN (Print) | 1063-6919 |
ISSN (Electronic) | 2575-7075 |
Conference
Title | 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) |
---|---|
Place | United States |
City | Long Beach |
Period | 16 - 20 June 2019 |
Link(s)
Abstract
The human visual system excels at biasing the stereoscopic visual signals by the attention mechanisms. Traditional methods relying on the low-level features and depth relevant information for stereoscopic video saliency prediction have fundamental limitations. For example, it is cumbersome to model the interactions between multiple visual cues including spatial, temporal, and depth information as a result of the sophistication. In this paper, we argue that the high-level features are crucial and resort to the deep learning framework to learn the saliency map of stereoscopic videos. Driven by spatio-temporal coherence from consecutive frames, the model first imitates the mechanism of saliency by taking advantage of the 3D convolutional neural network. Subsequently, the saliency originated from the intrinsic depth is derived based on the correlations between left and right views in a data-driven manner. Finally, a Convolutional Long Short-Term Memory (Conv-LSTM) based fusion network is developed to model the instantaneous interactions between spatio-temporal and depth attributes, such that the ultimate stereoscopic saliency maps over time are produced. Moreover, we establish a new large-scale stereoscopic video saliency dataset (SVS) including 175 stereoscopic video sequences and their fixation density annotations, aiming to comprehensively study the intrinsic attributes for stereoscopic video saliency detection. Extensive experiments show that our proposed model can achieve superior performance compared to the state-of-the-art methods on the newly built dataset for stereoscopic videos.
Research Area(s)
- 3D from Multiview and Sensors, Datasets and Evaluation, Deep Learning, RGBD sensors and analytics, Video Analytics
Citation Format(s)
Learning to Explore Intrinsic Saliency for Stereoscopic Video. / Zhang, Qiudan; Wang, Xu; Wang, Shiqi et al.
Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers, 2019. p. 9741-9750.Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45) › 32_Refereed conference paper (with ISBN/ISSN) › peer-review