A Two-level Rectification Attention Network for Scene Text Recognition

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

18 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Pages (from-to)2404-2414
Journal / PublicationIEEE Transactions on Multimedia
Volume25
Online published27 Jan 2022
Publication statusPublished - 2023

Abstract

Scene text recognition is a challenging task in the computer vision field due to the diversity of text styles and the complexity of the image backgrounds. In recent decades, numerous text rectification and recognition methods have been proposed to solve these problems. However, most of these methods rectify texts at the geometry level or pixel level. The former is limited by geometric constraints, and the latter is prone to blurring the text. In this paper, we propose a two-level rectification attention network (TRAN) to rectify and recognize texts. This network consists of two parts: a two-level rectification network (TORN) and an attention-based recognition network (ABRN). Specifically, the TORN first rectifies texts at the geometry level and then performs a pixel-level adjustment, which not only eliminates the geometric constraints but also renders clear texts. The ABRN’s role is to recognize text in the rectified images. To improve the feature extraction ability of our model, we design a new channel-wise and kernel-wise attention unit, which enables the network to handle significant variations of character size and channel interdependencies. Furthermore, we propose a skip training strategy to make our model converge smoothly. We conduct experiments on various benchmarks, including regular and irregular datasets. The experimental results show that our method achieves a state-of-the-art performance. © 2022 IEEE.

Research Area(s)

  • Character recognition, Geometry, Hidden Markov models, Image recognition, optical character recognition, scene text recognition, spatial transformer network, Task analysis, Text recognition, text rectification, Training

Bibliographic Note

Information for this record is supplemented by the author(s) concerned.

Citation Format(s)

A Two-level Rectification Attention Network for Scene Text Recognition. / Wu, Lintai; Xu, Yong; Hou, Junhui et al.
In: IEEE Transactions on Multimedia, Vol. 25, 2023, p. 2404-2414.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review