Any region can be perceived equally and effectively on rotation pretext task using full rotation and weighted-region mixture

Wei Dai, Tianyi Wu, Rui Liu, Min Wang, Jianqin Yin, Jun Liu*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

3 Citations (Scopus)

Abstract

In recent years, self-supervised learning has emerged as a powerful approach to learning visual representations without requiring extensive manual annotation. One popular technique involves using rotation transformations of images, which provide a clear visual signal for learning semantic representation. However, in this work, we revisit the pretext task of predicting image rotation in self-supervised learning and discover that it tends to marginalise the perception of features located near the centre of an image. To address this limitation, we propose a new self-supervised learning method, namely FullRot, which spotlights underrated regions by resizing the randomly selected and cropped regions of images. Moreover, FullRot increases the complexity of the rotation pretext task by applying the degree-free rotation to the region cropped into a circle. To encourage models to learn from different general parts of an image, we introduce a new data mixture technique called WRMix, which merges two random intra-image patches. By combining these innovative crop and rotation methods with the data mixture scheme, our approach, FullRot + WRMix, surpasses the state-of-the-art self-supervision methods in classification, segmentation, and object detection tasks on ten benchmark datasets with an improvement of up to +13.98% accuracy on STL-10, +8.56% accuracy on CIFAR-10, +10.20% accuracy on Sports-100, +15.86% accuracy on Mammals-45, +15.15% accuracy on PAD-UFES-20, +32.44% mIoU on VOC 2012, +7.62% mIoU on ISIC 2018, +9.70% mIoU on FloodArea, +25.16% AP50 on VOC 2007, and +58.69% AP50 on UTDAC 2020. The code is available at https://github.com/anthonyweidai/FullRot_WRMix.

© 2024 Elsevier Ltd. All rights reserved.
Original languageEnglish
Article number106350
JournalNeural Networks
Volume176
Online published30 Apr 2024
DOIs
Publication statusPublished - Aug 2024

Funding

This work was supported by the Research Grant Council (RGC) of Hong Kong under Grant 11217922 , 11212321 and Grant ECS-21212720 , Guangdong Province Basic and Applied Basic Research Fund Project 2019A1515110175 , and the Science and Technology Innovation Committee of Shenzhen under Grant Type-C SGDX20210823104001011 .

Research Keywords

  • Self-supervised learning
  • Full rotation
  • Data mixing
  • Vision impairment

Publisher's Copyright Statement

  • COPYRIGHT TERMS OF DEPOSITED POSTPRINT FILE: © 2024 Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/.

Fingerprint

Dive into the research topics of 'Any region can be perceived equally and effectively on rotation pretext task using full rotation and weighted-region mixture'. Together they form a unique fingerprint.

Cite this