Skip to main navigation Skip to search Skip to main content

Benchmarking binary classification models on data sets with different degrees of imbalance

  • Ligang Zhou
  • , Kin Keung Lai

    Research output: Journal Publications and ReviewsRGC 22 - Publication in policy or professional journal

    Abstract

    In practice, there are many binary classification problems, such as credit risk assessment, medical testing for determining if a patient has a certain disease or not, etc. However, different problems have different characteristics that may lead to different difficulties of the problem. One important characteristic is the degree of imbalance of two classes in data sets. For data sets with different degrees of imbalance, are the commonly used binary classification methods still feasible? In this study, various binary classification models, including traditional statistical methods and newly emerged methods from artificial intelligence, such as linear regression, discriminant analysis, decision tree, neural network, support vector machines, etc., are reviewed, and their performance in terms of the measure of classification accuracy and area under Receiver Operating Characteristic (ROC) curve are tested and compared on fourteen data sets with different imbalance degrees. The results help to select the appropriate methods for problems with different degrees of imbalance. © 2009 Higher Education Press and Springer-Verlag GmbH.
    Original languageEnglish
    Pages (from-to)205-216
    JournalFrontiers of Computer Science in China
    Volume3
    Issue number2
    DOIs
    Publication statusPublished - Jun 2009

    Research Keywords

    • Area under Receiver Operating Characteristic (ROC) curve
    • Binary classification
    • Classification accuracy
    • Degrees of imbalance

    Fingerprint

    Dive into the research topics of 'Benchmarking binary classification models on data sets with different degrees of imbalance'. Together they form a unique fingerprint.

    Cite this