A Software Analytics Framework using Deep Learning on Generalized Data Representations

Project: ResearchGRF

View graph of relations



With the advent of data science and software analytic techniques, more and more people and organizations chose to build predictive models from software engineering data to aid software engineering practices and decisions such as predicting defects, quality and efforts required for projects. Over the past decades, several novel techniques and prediction models utilizing the latest statistical theories and machine learning techniques have been proposed to improve performance in prediction accuracy. All these techniques and models claim to have significantly improved prediction performance. Unfortunately the results of these work are not fully reusable in the software industrial practice, mainly because the techniques proposed have only been validated against some publicly available static datasets collected in the 80’s, and we have no way to validate the quality of these datasets and their features within. As software is developed more rapidly than ever, the traditional data analytics approach of training from an existing dataset and to make predictions such as efforts and defects using different methods may be less relevant and unrealistic. In addition to the challenges such as cross- and within-company dataset and incomplete dataset problems, there is no way software practitioners are able to make use of the advanced techniques proposed in these software engineering studies. The prediction platform should be closely relevant to the software development environment and process such that it is able to automatically understand and analyze the entire source code base, design constrains, complexity, and software development productivity. To accomplish these objectives, human software experts are usually required. But with the complexity of software increasing, it is becoming extremely difficult to perform these tasks using human experts, and therefore being able to automatically derive useful information from software engineering raw data such as source code and other attributes is essential to support advanced software analytics. Thus, the latter results are more generalizable and practical for use in the software industry. The principal goal of this project is to develop a new software analytics framework with generalized data representations, which is suitable for deep learning. This project seeks to address a range of challenging problems, such as automated feature engineering from raw software engineering data, to create deep-learning based models for defect and effort prediction, as well as recommendations for better decisions. The outcome of this work is significant in that it endeavors to render the next generation of software analytic techniques generalizable and useful in the industry. ?


Effective start/end date1/09/17 → …