Global-and-Local Aware Data Generation for the Class Imbalance Problem

Wentao Wang, Suhang Wang, Wenqi Fan, Zitao Liu, Jiliang Tang

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

17 Citations (Scopus)

Abstract

In many real-world classification applications such as fake news detection, the training data can be extremely imbalanced, which brings challenges to existing classifiers as the majority classes dominate the loss functions of classifiers. Oversampling techniques such as SMOTE are effective approaches to tackle the class imbalance problem by producing more synthetic minority samples. Despite their success, the majority of existing oversampling methods only consider local data distributions when generating minority samples, which can result in noisy minority samples that do not fit global data distributions or interleave with majority classes. Hence, in this paper, we study the class imbalance problem by simultaneously exploring local and global data information since: (i) the local data distribution could give detailed information for generating minority samples; and (ii) the global data distribution could provide guidance to avoid generating outliers or samples that interleave with majority classes. Specifically, we propose a novel framework GL-GAN, which leverages the SMOTE method to explore local distribution in a learned latent space and employs GAN to capture the global information, so that synthetic minority samples can be generated under even extremely imbalanced scenarios. Experimental results on diverse real data sets demonstrate the effectiveness of our GL-GAN framework in producing realistic and discriminative minority samples for improving the classification performance of various classifiers on imbalanced training data. Our code is available at https://github.com/wentao-repo/GL-GAN.
Original languageEnglish
Title of host publicationProceedings of the 2020 SIAM International Conference on Data Mining
PublisherSociety for Industrial and Applied Mathematics
Pages307-315
Number of pages9
ISBN (Electronic)9781611976236
DOIs
Publication statusPublished - May 2020
Event2020 SIAM International Conference on Data Mining (SDM20) - Cincinnati, United States
Duration: 7 May 20209 May 2020
https://www.siam.org/conferences/cm/conference/sdm20

Publication series

NameProceedings of the ... SIAM International Conference on Data Mining

Conference

Conference2020 SIAM International Conference on Data Mining (SDM20)
Abbreviated titleSDM2020
Country/TerritoryUnited States
CityCincinnati
Period7/05/209/05/20
Internet address

Research Keywords

  • Adversarial learning
  • Imbalanced data

Fingerprint

Dive into the research topics of 'Global-and-Local Aware Data Generation for the Class Imbalance Problem'. Together they form a unique fingerprint.

Cite this