Data Preparation for Deep Learning based Code Smell Detection: A Systematic Literature Review

Fengji Zhang, Zexian Zhang, Jacky Wai Keung, Xiangru Tang, Zhen Yang, Xiao Yu*, Wenhua Hu

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

10 Citations (Scopus)

Abstract

Code Smell Detection (CSD) plays a crucial role in improving software quality and maintainability. And Deep Learning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-based CSD methods heavily relies on the quality of the training data. Despite its importance, little attention has been paid to analyzing the data preparation process.
This systematic literature review analyzes the data preparation techniques used in DL-based CSD methods. We identify 36 relevant papers published by December 2023 and provide a thorough analysis of the critical considerations in constructing CSD datasets, including data requirements, collection, labeling, and cleaning. We also summarize seven primary challenges and corresponding solutions in the literature.
Finally, we offer actionable recommendations for preparing and accessing high-quality CSD data, emphasizing the importance of data diversity, standardization, and accessibility. This survey provides valuable insights for researchers and practitioners to harness the full potential of DL techniques in CSD. © 2024 Elsevier Inc.
Original languageEnglish
Article number112131
JournalJournal of Systems and Software
Volume216
Online published12 Jun 2024
DOIs
Publication statusPublished - Oct 2024

Bibliographical note

Information for this record is supplemented by the author(s) concerned.

Funding

This work is partially supported by the National Natural Science Foundation of China ( 62202350 ), the Natural Science Foundation of Chongqing, China ( cstc2021jcyj-msxmX1115 ), and the General Research Fund of the Research Grants Council of Hong Kong and the research funds of the City University of Hong Kong ( 6000796 , 9229109 , 9229098 , 9220103 , 9229029 ).

Research Keywords

  • Code Smell Detection
  • Deep Learning
  • Data Preparation
  • Systematic Literature Review

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Data Preparation for Deep Learning based Code Smell Detection: A Systematic Literature Review'. Together they form a unique fingerprint.

Cite this