CSFwinformer: Cross-Space-Frequency Window Transformer for Mirror Detection

Zhifeng Xie, Sen Wang, Qiucheng Yu, Xin Tan*, Yuan Xie

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

24 Citations (Scopus)

Abstract

Mirror detection is a challenging task since mirrors do not possess a consistent visual appearance. Even the Segment Anything Model (SAM), which boasts superior zero-shot performance, cannot accurately detect the position of mirrors. Existing methods determine the position of the mirror under hypothetical conditions, such as the correspondence between objects inside and outside the mirror, and the semantic association between the mirror and surrounding objects. However, these assumptions do not apply to all scenarios. For instance, there may be no corresponding real objects to the reflected objects in the scene, or it may be challenging to extract meaningful semantic associations in complex scenes. On the other hand, humans can easily recognize mirrors through the specular texture caused by materials. To mine mirror features in more general scenes, we propose a Cross-Space-Frequency Window Transformer (CSFwinformer) to extract spatial and frequency features for texture analysis. Specifically, we design a Spatial-Frequency Window Alignment module (SFWA) to calculate spatial-frequency feature affinities and learn the difference between mirror and non-mirror textures. We then propose a Dilated Window Attention (DWA) to extract global features to complement the limitation of window alignment. Besides, we propose a Cross-Modality Context Contrast module (CMCC) to fuse cross-modality features and global features, which enables information flow between different windows to take full advantage of cross-modality information. Extensive experiments show that our method performs favorably against state-of-the-art methods on three mirror detection benchmarks and significantly improved SAM performance on mirror detection. The code is available at https://github.com/wangsen99/CSFwinformer. © 1992-2012 IEEE.
Original languageEnglish
Pages (from-to)1853-1867
JournalIEEE Transactions on Image Processing
Volume33
Online published7 Mar 2024
DOIs
Publication statusPublished - 2024

Research Keywords

  • cross-modality learning
  • frequency learning
  • Mirror detection
  • texture analysis

Fingerprint

Dive into the research topics of 'CSFwinformer: Cross-Space-Frequency Window Transformer for Mirror Detection'. Together they form a unique fingerprint.

Cite this