A Software Analytics Framework using Deep Learning on Generalized Data Representations

Project: Research

View graph of relations


With the advent of data science and software analytic techniques, more and more peopleand organizations chose to build predictive models from software engineering data to aidsoftware engineering practices and decisions such as predicting defects, quality andefforts required for projects.Over the past decades, several novel techniques and prediction models utilizing the lateststatistical theories and machine learning techniques have been proposed to improveperformance in prediction accuracy. All these techniques and models claim to havesignificantly improved prediction performance. Unfortunately the results of these workare not fully reusable in the software industrial practice, mainly because the techniquesproposed have only been validated against some publicly available static datasetscollected in the 80’s, and we have no way to validate the quality of these datasets andtheir features within. As software is developed more rapidly than ever, the traditionaldata analytics approach of training from an existing dataset and to make predictionssuch as efforts and defects using different methods may be less relevant and unrealistic.In addition to the challenges such as cross- and within-company dataset and incompletedataset problems, there is no way software practitioners are able to make use of theadvanced techniques proposed in these software engineering studies. The predictionplatform should be closely relevant to the software development environment andprocess such that it is able to automatically understand and analyze the entire sourcecode base, design constrains, complexity, and software development productivity. Toaccomplish these objectives, human software experts are usually required. But with thecomplexity of software increasing, it is becoming extremely difficult to perform thesetasks using human experts, and therefore being able to automatically derive usefulinformation from software engineering raw data such as source code and otherattributes is essential to support advanced software analytics. Thus, the latter results aremore generalizable and practical for use in the software industry.The principal goal of this project is to develop a new software analytics framework withgeneralized data representations, which is suitable for deep learning. This project seeksto address a range of challenging problems, such as automated feature engineering fromraw software engineering data, to create deep-learning based models for defect andeffort prediction, as well as recommendations for better decisions. The outcome of thiswork is significant in that it endeavors to render the next generation of softwareanalytic techniques generalizable and useful in the industry.?


Project number9042499
Grant typeGRF
Effective start/end date1/09/1725/02/21