TY  - EJOU
AU  - Luo, Zhengbo 
AU  - Parvïn, Hamïd 
AU  - Garg, Harish 
AU  - Qasem, Sultan Noman 
AU  - Pho, Kim-Hung 
AU  - Mansor, Zulkefli 

TI  - Dealing with Imbalanced Dataset Leveraging Boundary Samples Discovered by Support Vector Data Description
T2  - Computers, Materials \& Continua

PY  - 2021
VL  - 66
IS  - 3
SN  - 1546-2226

AB  - These days, imbalanced datasets, denoted throughout the paper by ID, (a dataset that contains some (usually two) classes where one contains considerably smaller number of samples than the other(s)) emerge in many real world problems (like health care systems or disease diagnosis systems, anomaly detection, fraud detection, stream based malware detection systems, and so on) and these datasets cause some problems (like under-training of minority class(es) and over-training of majority class(es), bias towards majority class(es), and so on) in classification process and application. Therefore, these datasets take the focus of many researchers in any science and there are several solutions for dealing with this problem. The main aim of this study for dealing with IDs is to resample the borderline samples discovered by Support Vector Data Description (SVDD). There are naturally two kinds of resampling: Under-sampling (U-S) and over-sampling (O-S). The O-S may cause the occurrence of over-fitting (the occurrence of over-fitting is its main drawback). The U-S can cause the occurrence of significant information loss (the occurrence of significant information loss is its main drawback). In this study, to avoid the drawbacks of the sampling techniques, we focus on the samples that may be misclassified. The data points that can be misclassified are considered to be the borderline data points which are on border(s) between the majority class(es) and minority class(es). First by SVDD, we find the borderline examples; then, the data resampling is applied over them. At the next step, the base classifier is trained on the newly created dataset. Finally, we compare the result of our method in terms of Area Under Curve (AUC) and F-measure and G-mean with the other state-of-the-art methods. We show that our method has better results than the other state-of-the-art methods on our experimental study.
KW  - Imbalanced learning; classification; borderline examples

DO  - 10.32604/cmc.2021.012547