Open Access iconOpen Access

ARTICLE

MCBC-SMOTE: A Majority Clustering Model for Classification of Imbalanced Data

Jyoti Arora1, Meena Tushir2, Keshav Sharma1, Lalit Mohan1, Aman Singh3,*, Abdullah Alharbi4, Wael Alosaimi4

1 Department of Information Technology, MSIT, GGSIPU, New Delhi, 110058, India
2 Department of Electrical and Electronic Engineering, MSIT, GGSIPU, New Delhi, 110058, India
3 School of Computer Science and Engineering, Lovely Professional University, 144411, Punjab, India
4 Department of Information Technology, College of Computers and Information Technology, Taif University, 11099, Taif 21944, Saudi Arabia

* Corresponding Author: Aman Singh. Email: email

Computers, Materials & Continua 2022, 73(3), 4801-4817. https://doi.org/10.32604/cmc.2022.025960

Abstract

Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms. In supervised learning, dealing with the problem of class imbalance is still considered to be a challenging research problem. Various machine learning techniques are designed to operate on balanced datasets; therefore, the state of the art, different under-sampling, over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets, but highly skewed datasets still pose the problem of generalization and noise generation during resampling. To over-come these problems, this paper proposes a majority clustering model for classification of imbalanced datasets known as MCBC-SMOTE (Majority Clustering for balanced Classification-SMOTE). The model provides a method to convert the problem of binary classification into a multi-class problem. In the proposed algorithm, the number of clusters for the majority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution. The proposed technique is cost-effective, reduces the problem of noise generation and successfully disables the imbalances present in between and within classes. The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics.

Keywords


Cite This Article

J. Arora, M. Tushir, K. Sharma, L. Mohan, A. Singh et al., "Mcbc-smote: a majority clustering model for classification of imbalanced data," Computers, Materials & Continua, vol. 73, no.3, pp. 4801–4817, 2022.



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1040

    View

  • 543

    Download

  • 2

    Like

Share Link