Learning to Explore Intrinsic Saliency for Stereoscopic Video

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)peer-review

2 Scopus Citations
View graph of relations

Author(s)

  • Xu Wang
  • Shikai Li
  • Jianmin Jiang

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers
Pages9741-9750
ISBN (Electronic)978-1-7281-3293-8
ISBN (Print)978-1-7281-3294-5
Publication statusPublished - Jun 2019

Publication series

Name
ISSN (Print)1063-6919
ISSN (Electronic)2575-7075

Conference

Title32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019)
PlaceUnited States
CityLong Beach
Period16 - 20 June 2019

Abstract

The human visual system excels at biasing the stereoscopic visual signals by the attention mechanisms. Traditional methods relying on the low-level features and depth relevant information for stereoscopic video saliency prediction have fundamental limitations. For example, it is cumbersome to model the interactions between multiple visual cues including spatial, temporal, and depth information as a result of the sophistication. In this paper, we argue that the high-level features are crucial and resort to the deep learning framework to learn the saliency map of stereoscopic videos. Driven by spatio-temporal coherence from consecutive frames, the model first imitates the mechanism of saliency by taking advantage of the 3D convolutional neural network. Subsequently, the saliency originated from the intrinsic depth is derived based on the correlations between left and right views in a data-driven manner. Finally, a Convolutional Long Short-Term Memory (Conv-LSTM) based fusion network is developed to model the instantaneous interactions between spatio-temporal and depth attributes, such that the ultimate stereoscopic saliency maps over time are produced. Moreover, we establish a new large-scale stereoscopic video saliency dataset (SVS) including 175 stereoscopic video sequences and their fixation density annotations, aiming to comprehensively study the intrinsic attributes for stereoscopic video saliency detection. Extensive experiments show that our proposed model can achieve superior performance compared to the state-of-the-art methods on the newly built dataset for stereoscopic videos.

Research Area(s)

  • 3D from Multiview and Sensors, Datasets and Evaluation, Deep Learning, RGBD sensors and analytics, Video Analytics

Citation Format(s)

Learning to Explore Intrinsic Saliency for Stereoscopic Video. / Zhang, Qiudan; Wang, Xu; Wang, Shiqi; Li, Shikai; Kwong, Sam; Jiang, Jianmin.

Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers, 2019. p. 9741-9750.

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)peer-review