Abstract
Semantic alignment aims to establish pixel correspondences between images based on semantic consistency. It can serve as a fundamental component for various downstream computer vision tasks, such as style transfer and exemplar-based colorization, etc. Many existing methods use local features and their cosine similarities to infer semantic alignment. However, they struggle with significant intra-class variation of objects, such as appearance, size, etc. In other words, contents with the same semantics tend to be significantly different in vision. To address this issue, we propose a novel deep neural network of which the core lies in global feature enhancement and adaptive multi-scale inference. Specifically, two modules are proposed: an enhancement transformer for enhancing semantic features with global awareness; a probabilistic correlation module for adaptively fusing multi-scale information based on the learned confidence scores. We use the unified network architecture to achieve two types of semantic alignment, namely, cross-object semantic alignment and cross-domain semantic alignment. Experimental results demonstrate that our method achieves competitive performance on five standard cross-object semantic alignment benchmarks, and outperforms the state of the arts in cross-domain semantic alignment.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
Original language | English |
---|---|
Pages (from-to) | 897-910 |
Number of pages | 14 |
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
Volume | 34 |
Issue number | 2 |
Online published | 21 Jun 2023 |
DOIs | |
Publication status | Published - Feb 2024 |