Improving Code Readability Classification Using Deep Learning Techniques

基於深度學習技術的代碼可讀性分類研究

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date16 Aug 2018

Abstract

Code readability classification (which refers to classification of a piece of source code as either readable or unreadable) has attracted increasing concern in academia and industry. To build accurate classification models, existing studies mainly consist of two phases: 1) Collect labeled data by conducting a large-scale survey, inviting multiple human annotators (preferably domain experts) to rate code snippets by readability; 2) Handcraft features from different aspects that intuitively seem to have some effect on code readability, and then train a machine learning classifier using data collected from the first phase.

Despite the encouraging results, several problems have been identified with existing approaches. For instance, in Phase 1, the process of code readability survey has long been criticized as laborious and tedious (Problem 1), while in Phase 2, the widely used feature engineering method is labor-intensive and can capture only partial information about the source code (Problem 2). In this thesis, we deal with these two problems by using several latest techniques.

To address Problem 1, we introduce a new interactive method with the advent of gamification techniques and implement it as GamiCRS (A Gamification System for Code Readability Survey). The focus is on incorporating game-based mechanisms to enable participants with positive attitudes towards a more interesting survey process, with the underlying goal of improving the quality of survey results. A complete incentive and reward model is proposed together with a combination of both intrinsic and extrinsic motivators identified. To ensure its dynamic efficacy, a field experiment is carried out to compare GamiCRS with its non-gamified counterparts. The empirical results show a positive effect towards the application of GamiCRS.

To address Problem 2, we introduce a deep learning-based approach that can learn complicated underlying features automatically from the source code. Our approach is not another model based on feature engineering, but a constructive method of gaining a deep understanding of what constitutes a readable code, which makes our study novel in this regard. Specifically, we treat a source code as a matrix of symbols, and leverage Convolutional Neural Networks (ConvNets) to automatically learn features directly from the input data. Our approach has the advantage of requiring no human intervention and thus can effectively avoid personal biases and neglects of certain features.

First, we formulate a strategy for source code representation to enable deep learning-based program analyses. In particular, we parse the source codes into a set of symbols according to different granularities (i.e., Character-Level, Token-Level, and Node-Level Representation), and then convert the symbols into integer matrices, which is the universal format that ConvNets can easily analyze and manipulate. Corresponding to each granularity, we construct three separate ConvNets with identical architectures. The objective is to have multiple ConvNets that are skillful, but from different perspectives. In order to improve model generality and applicability, we then aggregate the three ConvNets with adjustable weights. We denote the resulting model as DeepCRM (A Deep Learning-Based Code Readability Model).

To evaluate whether deep learning is beneficial, we compare our approach with five state-of-the-art code readability models, namely, Buse et al.'s Model, Posnett et al.'s Model, Dorn et al.'s Model, Scalabrino et al.'s Model, and A Comprehensive Model. The experimental results show that DeepCRM outperforms previous approaches. The improvement in accuracy ranges from 2.4% to 17.2%, confirming the efficacy of deep learning techniques in the task of code readability classification. Although deep learning has achieved remarkable success in other areas, to the best of our knowledge, we are the first to observe its effectiveness in code readability classification. We hope that the promising results will interest and encourage more in-depth research in this new field.