TY - JOUR
T1 - Learning to Explore Saliency for Stereoscopic Videos via Component-Based Interaction
AU - Zhang, Qiudan
AU - Wang, Xu
AU - Wang, Shiqi
AU - Sun, Zhenhao
AU - Kwong, Sam
AU - Jiang, Jianmin
PY - 2020
Y1 - 2020
N2 - In this paper, we devise a saliency prediction model for stereoscopic videos that learns to explore saliency inspired by the component-based interactions including spatial, temporal, as well as depth cues. The model first takes advantage of specific structure of 3D residual network (3D-ResNet) to model the saliency driven by spatio-temporal coherence from consecutive frames. Subsequently, the saliency inferred by implicit-depth is automatically derived based on the displacement correlation between left and right views by leveraging a deep convolutional network (ConvNet). Finally, a component-wise refinement network is devised to produce final saliency maps over time by aggregating saliency distributions obtained from multiple components. In order to further facilitate research towards stereoscopic video saliency, we create a new dataset including 175 stereoscopic video sequences with diverse content, as well as their dense eye fixation annotations. Extensive experiments support that our proposed model can achieve superior performance compared to the state-of-the-art methods on all publicly available eye fixation datasets.
AB - In this paper, we devise a saliency prediction model for stereoscopic videos that learns to explore saliency inspired by the component-based interactions including spatial, temporal, as well as depth cues. The model first takes advantage of specific structure of 3D residual network (3D-ResNet) to model the saliency driven by spatio-temporal coherence from consecutive frames. Subsequently, the saliency inferred by implicit-depth is automatically derived based on the displacement correlation between left and right views by leveraging a deep convolutional network (ConvNet). Finally, a component-wise refinement network is devised to produce final saliency maps over time by aggregating saliency distributions obtained from multiple components. In order to further facilitate research towards stereoscopic video saliency, we create a new dataset including 175 stereoscopic video sequences with diverse content, as well as their dense eye fixation annotations. Extensive experiments support that our proposed model can achieve superior performance compared to the state-of-the-art methods on all publicly available eye fixation datasets.
KW - Visual saliency
KW - stereoscopic video
KW - deep learning
UR - http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=LinksAMR&SrcApp=PARTNER_APP&DestLinkType=FullRecord&DestApp=WOS&KeyUT=000529943000014
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85084135184&origin=recordpage
UR - http://www.scopus.com/inward/record.url?scp=85084135184&partnerID=8YFLogxK
U2 - 10.1109/TIP.2020.2985531
DO - 10.1109/TIP.2020.2985531
M3 - RGC 21 - Publication in refereed journal
SN - 1057-7149
VL - 29
SP - 5722
EP - 5736
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -