A hybrid evolutionary algorithm for attribute selection in data mining

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journal

79 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Pages (from-to)8616-8630
Journal / PublicationExpert Systems with Applications
Volume36
Issue number4
Online published31 Oct 2008
Publication statusPublished - May 2009
Externally publishedYes

Abstract

Real life data sets are often interspersed with noise, making the subsequent data mining process difficult. The task of the classifier could be simplified by eliminating attributes that are deemed to be redundant for classification, as the retention of only pertinent attributes would reduce the size of the dataset and subsequently allow more comprehensible analysis of the extracted patterns or rules. In this article, a new hybrid approach comprising of two conventional machine learning algorithms has been proposed to carry out attribute selection. Genetic algorithms (GAs) and support vector machines (SVMs) are integrated effectively based on a wrapper approach. Specifically, the GA component searches for the best attribute set by applying the principles of an evolutionary process. The SVM then classifies the patterns in the reduced datasets, corresponding to the attribute subsets represented by the GA chromosomes. The proposed GA-SVM hybrid is subsequently validated using datasets obtained from the UCI machine learning repository. Simulation results demonstrate that the GA-SVM hybrid produces good classification accuracy and a higher level of consistency that is comparable to other established algorithms. In addition, improvements are made to the hybrid by using a correlation measure between attributes as a fitness measure to replace the weaker members in the population with newly formed chromosomes. This injects greater diversity and increases the overall fitness of the population. Similarly, the improved mechanism is also validated on the same data sets used in the first stage. The results justify the improvements in the classification accuracy and demonstrate its potential to be a good classifier for future data mining purposes.

Research Area(s)

  • Attribute selection, Data mining, Evolutionary algorithms, Pattern classification, Support vector machines

Citation Format(s)

A hybrid evolutionary algorithm for attribute selection in data mining. / Tan, K.C.; Teoh, E.J.; Yu, Q. ; Goh, K.C.

In: Expert Systems with Applications, Vol. 36, No. 4, 05.2009, p. 8616-8630.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journal