    MCBC-SMOTE: A Majority Clustering Model for Classification of Imbalanced Data

    Jyoti Arora1, Meena Tushir2, Keshav Sharma1, Lalit Mohan1, Aman Singh3,*, Abdullah Alharbi4, Wael Alosaimi4

    CMC-Computers, Materials & Continua, Vol.73, No.3, pp. 4801-4817, 2022, DOI:10.32604/cmc.2022.025960

    Abstract Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms. In supervised learning, dealing with the problem of class imbalance is still considered to be a challenging research problem. Various machine learning techniques are designed to operate on balanced datasets; therefore, the state of the art, different under-sampling, over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets, but highly skewed datasets still pose the problem of generalization and noise generation during resampling. To over-come these problems, this paper proposes a majority clustering model for classification of imbalanced datasets known… More >

  • Open Access


    Water Quality Index Using Modified Random Forest Technique: Assessing Novel Input Features

    Wen Yee Wong1, Ayman Khallel Ibrahim Al-Ani1, Khairunnisa Hasikin1,*, Anis Salwa Mohd Khairuddin2, Sarah Abdul Razak3, Hanee Farzana Hizaddin4, Mohd Istajib Mokhtar5, Muhammad Mokhzaini Azizan6

    CMES-Computer Modeling in Engineering & Sciences, Vol.132, No.3, pp. 1011-1038, 2022, DOI:10.32604/cmes.2022.019244

    Abstract Water quality analysis is essential to understand the ecological status of aquatic life. Conventional water quality index (WQI) assessment methods are limited to features such as water acidic or basicity (pH), dissolved oxygen (DO), biological oxygen demand (BOD), chemical oxygen demand (COD), ammoniacal nitrogen (NH3-N), and suspended solids (SS). These features are often insufficient to represent the water quality of a heavy metal–polluted river. Therefore, this paper aims to explore and analyze novel input features in order to formulate an improved WQI. In this work, prospective insights on the feasibility of alternative water quality input variables as new discriminant features… More >

  • Open Access


    An Imbalanced Dataset and Class Overlapping Classification Model for Big Data

    Mini Prince1,*, P. M. Joe Prathap2

    Computer Systems Science and Engineering, Vol.44, No.2, pp. 1009-1024, 2023, DOI:10.32604/csse.2023.024277

    Abstract Most modern technologies, such as social media, smart cities, and the internet of things (IoT), rely on big data. When big data is used in the real-world applications, two data challenges such as class overlap and class imbalance arises. When dealing with large datasets, most traditional classifiers are stuck in the local optimum problem. As a result, it’s necessary to look into new methods for dealing with large data collections. Several solutions have been proposed for overcoming this issue. The rapid growth of the available data threatens to limit the usefulness of many traditional methods. Methods such as oversampling and… More >

  • Open Access


    Hyper-Parameter Optimization of Semi-Supervised GANs Based-Sine Cosine Algorithm for Multimedia Datasets

    Anas Al-Ragehi1, Said Jadid Abdulkadir1,2,*, Amgad Muneer1,2, Safwan Sadeq3, Qasem Al-Tashi4,5

    CMC-Computers, Materials & Continua, Vol.73, No.1, pp. 2169-2186, 2022, DOI:10.32604/cmc.2022.027885

    Abstract Generative Adversarial Networks (GANs) are neural networks that allow models to learn deep representations without requiring a large amount of training data. Semi-Supervised GAN Classifiers are a recent innovation in GANs, where GANs are used to classify generated images into real and fake and multiple classes, similar to a general multi-class classifier. However, GANs have a sophisticated design that can be challenging to train. This is because obtaining the proper set of parameters for all models-generator, discriminator, and classifier is complex. As a result, training a single GAN model for different datasets may not produce satisfactory results. Therefore, this study… More >

  • Open Access


    SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification

    Mohd Anul Haq*

    CMC-Computers, Materials & Continua, Vol.71, No.1, pp. 1403-1425, 2022, DOI:10.32604/cmc.2022.021968

    Abstract Rapid industrialization and urbanization are rapidly deteriorating ambient air quality, especially in the developing nations. Air pollutants impose a high risk on human health and degrade the environment as well. Earlier studies have used machine learning (ML) and statistical modeling to classify and forecast air pollution. However, these methods suffer from the complexity of air pollution dataset resulting in a lack of efficient classification and forecasting of air pollution. ML-based models suffer from improper data pre-processing, class imbalance issues, data splitting, and hyperparameter tuning. There is a gap in the existing ML-based studies on air pollution due to improper data… More >

  • Open Access


    Improving Routine Immunization Coverage Through Optimally Designed Predictive Models

    Fareeha Sameen1, Abdul Momin Kazi2, Majida Kazmi1,*, Munir A Abbasi3, Saad Ahmed Qazi1,4, Lampros K Stergioulas3,5

    CMC-Computers, Materials & Continua, Vol.70, No.1, pp. 375-395, 2022, DOI:10.32604/cmc.2022.019167

    Abstract Routine immunization (RI) of children is the most effective and timely public health intervention for decreasing child mortality rates around the globe. Pakistan being a low-and-middle-income-country (LMIC) has one of the highest child mortality rates in the world occurring mainly due to vaccine-preventable diseases (VPDs). For improving RI coverage, a critical need is to establish potential RI defaulters at an early stage, so that appropriate interventions can be targeted towards such population who are identified to be at risk of missing on their scheduled vaccine uptakes. In this paper, a machine learning (ML) based predictive model has been proposed to… More >

  • Open Access


    Multi-Class Sentiment Analysis of Social Media Data with Machine Learning Algorithms

    Galimkair Mutanov, Vladislav Karyukin*, Zhanl Mamykova

    CMC-Computers, Materials & Continua, Vol.69, No.1, pp. 913-930, 2021, DOI:10.32604/cmc.2021.017827

    Abstract The volume of social media data on the Internet is constantly growing. This has created a substantial research field for data analysts. The diversity of articles, posts, and comments on news websites and social networks astonishes imagination. Nevertheless, most researchers focus on posts on Twitter that have a specific format and length restriction. The majority of them are written in the English language. As relatively few works have paid attention to sentiment analysis in the Russian and Kazakh languages, this article thoroughly analyzes news posts in the Kazakhstan media space. The amassed datasets include texts labeled according to three sentiment… More >

  • Open Access


    Oversampling Methods Combined Clustering and Data Cleaning for Imbalanced Network Data

    Yang Yang1,*, Qian Zhao1, Linna Ruan2, Zhipeng Gao1, Yonghua Huo3, Xuesong Qiu1

    Intelligent Automation & Soft Computing, Vol.26, No.5, pp. 1139-1155, 2020, DOI:10.32604/iasc.2020.011705

    Abstract In network anomaly detection, network traffic data are often imbalanced, that is, certain classes of network traffic data have a large sample data volume while other classes have few, resulting in reduced overall network traffic anomaly detection on a minority class of samples. For imbalanced data, researchers have proposed the use of oversampling techniques to balance data sets; in particular, an oversampling method called the SMOTE provides a simple and effective solution for balancing data sets. However, current oversampling methods suffer from the generation of noisy samples and poor information quality. Hence, this study proposes an oversampling method for imbalanced… More >

  • Open Access


    Improving Performance Prediction on Education Data with Noise and Class Imbalance

    Akram M. Radwana,b, Zehra Cataltepea,c

    Intelligent Automation & Soft Computing, Vol.24, No.4, pp. 777-783, 2018, DOI:10.1080/10798587.2017.1337673

    Abstract This paper proposes to apply machine learning techniques to predict students’ performance on two real-world educational data-sets. The first data-set is used to predict the response of students with autism while they learn a specific task, whereas the second one is used to predict students’ failure at a secondary school. The two data-sets suffer from two major problems that can negatively impact the ability of classification models to predict the correct label; class imbalance and class noise. A series of experiments have been carried out to improve the quality of training data, and hence improve prediction results. In this paper,… More >

  • Open Access


    Credit Card Fraud Detection Based on Machine Learning

    Yong Fang1, Yunyun Zhang2, Cheng Huang1,*

    CMC-Computers, Materials & Continua, Vol.61, No.1, pp. 185-195, 2019, DOI:10.32604/cmc.2019.06144

    Abstract In recent years, the rapid development of e-commerce exposes great vulnerabilities in online transactions for fraudsters to exploit. Credit card transactions take a salient role in nowadays’ online transactions for its obvious advantages including discounts and earning credit card points. So credit card fraudulence has become a target of concern. In order to deal with the situation, credit card fraud detection based on machine learning is been studied recently. Yet, it is difficult to detect fraudulent transactions due to data imbalance (normal and fraudulent transactions), for which Smote algorithm is proposed in order to resolve data imbalance. The assessment of… More >

