Abstract
Salient Object Ranking (SOR) is the process of predicting the order of an observer's attention to objects when viewing a complex scene. Existing SOR methods primarily focus on ranking various scene objects simultaneously by exploring their spatial and semantic properties. However, their solutions of simultaneously ranking all salient objects do not align with human viewing behavior, and may result in incorrect attention shift predictions. We observe that humans view a scene through a sequential and continuous process involving a cycle of foveating to objects of interest with our foveal vision while using peripheral vision to prepare for the next fixation location. For instance, when we see a flying kite, our foveal vision captures the kite itself, while our peripheral vision can help us locate the person controlling it such that we can smoothly divert our attention to it next. By repeatedly carrying out this cycle, we can gain a thorough understanding of the entire scene. Based on this observation, we propose to model the dynamic interplay between foveal and peripheral vision to predict human attention shifts sequentially. To this end, we propose a novel SOR model, SeqRank, which reproduces foveal vision to extract high-acuity visual features for accurate salient instance segmentation while also modeling peripheral vision to select the object that is likely to grab the viewer’s attention next. By incorporating both types of vision, our model can mimic human viewing behavior better and provide a more faithful ranking among various scene objects. Most notably, our model improves the SA-SOR/MAE scores by +6.1%/-13.0% on IRSR, compared with the state-of-the-art. Extensive experiments show the superior performance of our model on the SOR benchmarks. Code is available at https://github.com/guanhuankang/SeqRank.
© 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
© 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Original language | English |
---|---|
Title of host publication | Proceedings of the 38th AAAI Conference on Artificial Intelligence |
Editors | Jennifer Dy, Sriraam Natarajan, Michael Wooldridge |
Publisher | AAAI Press |
Pages | 1941-1949 |
Number of pages | 9 |
ISBN (Print) | 1-57735-887-2, 978-1-57735-887-9 |
DOIs | |
Publication status | Published - 2024 |
Event | 38th Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (AAAI-24) - Vancouver Convention Center, Vancouver, Canada Duration: 20 Feb 2024 → 27 Feb 2024 https://aaai.org/aaai-conference/ https://ojs.aaai.org/index.php/AAAI/issue/archive |
Publication series
Name | Proceedings of the AAAI Conference on Artificial Intelligence |
---|---|
Number | 3 |
Volume | 38 |
ISSN (Print) | 2159-5399 |
ISSN (Electronic) | 2374-3468 |
Conference
Conference | 38th Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (AAAI-24) |
---|---|
Country/Territory | Canada |
City | Vancouver |
Period | 20/02/24 → 27/02/24 |
Internet address |
Bibliographical note
Information for this record is supplemented by the author(s) concerned.Research Keywords
- Low level vision
- salient object detection (SOD)
- salient object ranking (SOR)