Synthetic data with neural machine translation for automatic correction in arabic grammar

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

24 Scopus Citations
View graph of relations

Author(s)

  • Aiman Solyman
  • Wang Zhenyu
  • Tao Qian
  • Arafat Abdulgader Mohammed Elhag
  • Zeinab Aleibeid

Detail(s)

Original languageEnglish
Pages (from-to)303-315
Number of pages13
Journal / PublicationEgyptian Informatics Journal
Volume22
Issue number3
Online published24 Dec 2020
Publication statusPublished - Sept 2021
Externally publishedYes

Abstract

The automatic correction of grammar and spelling errors is important for students, second language learners, and some Natural Language Processing (NLP) tasks such as part of speech and text summarization. Recently, Neural Machine Translation (NMT) has been an out-performing and well-established model in the task of Grammar Error Correction (GEC). Arabic GEC is still growing because of some challenges, such as scarcity of training sets and the complexity of Arabic language. To overcome these issues, we introduced an unsupervised method to generate large-scale synthetic training data based on confusion function to increase the amount of training set. Furthermore, we introduced a supervised NMT model for AGEC called SCUT AGEC. SCUT AGEC is a convolutional sequence-to-sequence model consisting of nine encoder-decoder layers with attention mechanism. We applied fine-tuning to improve the performance and get more efficient results. Convolutional Neural Networks (CNN) gives our model ability to joint feature extraction and classification in one task and we proved that it is an efficient way to capture features of the local context. Moreover, it is easy to obtain long-term dependencies because of convolutional layers staking. Our proposed model becomes the first supervised AGEC system based on the convolutional sequence-to-sequence learning to outperforms the current state-of-the-art neural AGEC models.

Research Area(s)

  • Arabic grammar error correction, Convolutional neural networks, Natural language processing

Bibliographic Note

Publisher Copyright: © 2021

Citation Format(s)

Synthetic data with neural machine translation for automatic correction in arabic grammar. / Solyman, Aiman; Zhenyu, Wang; Qian, Tao et al.
In: Egyptian Informatics Journal, Vol. 22, No. 3, 09.2021, p. 303-315.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review