Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (20)
  • Open Access

    ARTICLE

    A Stacked Ensemble Deep Learning Approach for Imbalanced Multi-Class Water Quality Index Prediction

    Wen Yee Wong1, Khairunnisa Hasikin1,*, Anis Salwa Mohd Khairuddin2, Sarah Abdul Razak3, Hanee Farzana Hizaddin4, Mohd Istajib Mokhtar5, Muhammad Mokhzaini Azizan6

    CMC-Computers, Materials & Continua, Vol.76, No.2, pp. 1361-1384, 2023, DOI:10.32604/cmc.2023.038045

    Abstract A common difficulty in building prediction models with realworld environmental datasets is the skewed distribution of classes. There are significantly more samples for day-to-day classes, while rare events such as polluted classes are uncommon. Consequently, the limited availability of minority outcomes lowers the classifier’s overall reliability. This study assesses the capability of machine learning (ML) algorithms in tackling imbalanced water quality data based on the metrics of precision, recall, and F1 score. It intends to balance the misled accuracy towards the majority of data. Hence, 10 ML algorithms of its performance are compared. The classifiers included are AdaBoost, Support Vector… More >

  • Open Access

    ARTICLE

    Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction

    Yap Bee Wah1,5,*, Azlan Ismail1,2, Nur Niswah Naslina Azid3, Jafreezal Jaafar4, Izzatdin Abdul Aziz4, Mohd Hilmi Hasan4, Jasni Mohamad Zain1,2

    CMC-Computers, Materials & Continua, Vol.75, No.3, pp. 4821-4841, 2023, DOI:10.32604/cmc.2023.034470

    Abstract Prediction of machine failure is challenging as the dataset is often imbalanced with a low failure rate. The common approach to handle classification involving imbalanced data is to balance the data using a sampling approach such as random undersampling, random oversampling, or Synthetic Minority Oversampling Technique (SMOTE) algorithms. This paper compared the classification performance of three popular classifiers (Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine) in predicting machine failure in the Oil and Gas industry. The original machine failure dataset consists of 20,473 hourly data and is imbalanced with 19945 (97%) ‘non-failure’ and 528 (3%) ‘failure data’. The… More >

  • Open Access

    ARTICLE

    Fault Diagnosis of Power Transformer Based on Improved ACGAN Under Imbalanced Data

    Tusongjiang. Kari1, Lin Du1, Aisikaer. Rouzi2, Xiaojing Ma1,*, Zhichao Liu1, Bo Li1

    CMC-Computers, Materials & Continua, Vol.75, No.2, pp. 4573-4592, 2023, DOI:10.32604/cmc.2023.037954

    Abstract The imbalance of dissolved gas analysis (DGA) data will lead to over-fitting, weak generalization and poor recognition performance for fault diagnosis models based on deep learning. To handle this problem, a novel transformer fault diagnosis method based on improved auxiliary classifier generative adversarial network (ACGAN) under imbalanced data is proposed in this paper, which meets both the requirements of balancing DGA data and supplying accurate diagnosis results. The generator combines one-dimensional convolutional neural networks (1D-CNN) and long short-term memories (LSTM), which can deeply extract the features from DGA samples and be greatly beneficial to ACGAN’s data balancing and fault diagnosis.… More >

  • Open Access

    ARTICLE

    Imbalanced Data Classification Using SVM Based on Improved Simulated Annealing Featuring Synthetic Data Generation and Reduction

    Hussein Ibrahim Hussein1, Said Amirul Anwar2,*, Muhammad Imran Ahmad2

    CMC-Computers, Materials & Continua, Vol.75, No.1, pp. 547-564, 2023, DOI:10.32604/cmc.2023.036025

    Abstract Imbalanced data classification is one of the major problems in machine learning. This imbalanced dataset typically has significant differences in the number of data samples between its classes. In most cases, the performance of the machine learning algorithm such as Support Vector Machine (SVM) is affected when dealing with an imbalanced dataset. The classification accuracy is mostly skewed toward the majority class and poor results are exhibited in the prediction of minority-class samples. In this paper, a hybrid approach combining data pre-processing technique and SVM algorithm based on improved Simulated Annealing (SA) was proposed. Firstly, the data pre-processing technique which… More >

  • Open Access

    ARTICLE

    LexDeep: Hybrid Lexicon and Deep Learning Sentiment Analysis Using Twitter for Unemployment-Related Discussions During COVID-19

    Azlinah Mohamed1,3,*, Zuhaira Muhammad Zain2, Hadil Shaiba2,*, Nazik Alturki2, Ghadah Aldehim2, Sapiah Sakri2, Saiful Farik Mat Yatin1, Jasni Mohamad Zain1

    CMC-Computers, Materials & Continua, Vol.75, No.1, pp. 1577-1601, 2023, DOI:10.32604/cmc.2023.034746

    Abstract The COVID-19 pandemic has spread globally, resulting in financial instability in many countries and reductions in the per capita gross domestic product. Sentiment analysis is a cost-effective method for acquiring sentiments based on household income loss, as expressed on social media. However, limited research has been conducted in this domain using the LexDeep approach. This study aimed to explore social trend analytics using LexDeep, which is a hybrid sentiment analysis technique, on Twitter to capture the risk of household income loss during the COVID-19 pandemic. First, tweet data were collected using Twint with relevant keywords before (9 March 2019 to… More >

  • Open Access

    ARTICLE

    An Effective Classifier Model for Imbalanced Network Attack Data

    Gürcan Çetin*

    CMC-Computers, Materials & Continua, Vol.73, No.3, pp. 4519-4539, 2022, DOI:10.32604/cmc.2022.031734

    Abstract Recently, machine learning algorithms have been used in the detection and classification of network attacks. The performance of the algorithms has been evaluated by using benchmark network intrusion datasets such as DARPA98, KDD’99, NSL-KDD, UNSW-NB15, and Caida DDoS. However, these datasets have two major challenges: imbalanced data and high-dimensional data. Obtaining high accuracy for all attack types in the dataset allows for high accuracy in imbalanced datasets. On the other hand, having a large number of features increases the runtime load on the algorithms. A novel model is proposed in this paper to overcome these two concerns. The number of… More >

  • Open Access

    ARTICLE

    MCBC-SMOTE: A Majority Clustering Model for Classification of Imbalanced Data

    Jyoti Arora1, Meena Tushir2, Keshav Sharma1, Lalit Mohan1, Aman Singh3,*, Abdullah Alharbi4, Wael Alosaimi4

    CMC-Computers, Materials & Continua, Vol.73, No.3, pp. 4801-4817, 2022, DOI:10.32604/cmc.2022.025960

    Abstract Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms. In supervised learning, dealing with the problem of class imbalance is still considered to be a challenging research problem. Various machine learning techniques are designed to operate on balanced datasets; therefore, the state of the art, different under-sampling, over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets, but highly skewed datasets still pose the problem of generalization and noise generation during resampling. To over-come these problems, this paper proposes a majority clustering model for classification of imbalanced datasets known… More >

  • Open Access

    ARTICLE

    An Imbalanced Dataset and Class Overlapping Classification Model for Big Data

    Mini Prince1,*, P. M. Joe Prathap2

    Computer Systems Science and Engineering, Vol.44, No.2, pp. 1009-1024, 2023, DOI:10.32604/csse.2023.024277

    Abstract Most modern technologies, such as social media, smart cities, and the internet of things (IoT), rely on big data. When big data is used in the real-world applications, two data challenges such as class overlap and class imbalance arises. When dealing with large datasets, most traditional classifiers are stuck in the local optimum problem. As a result, it’s necessary to look into new methods for dealing with large data collections. Several solutions have been proposed for overcoming this issue. The rapid growth of the available data threatens to limit the usefulness of many traditional methods. Methods such as oversampling and… More >

  • Open Access

    ARTICLE

    Hybrid Deep Learning Based Attack Detection for Imbalanced Data Classification

    Rasha Almarshdi1,2,*, Laila Nassef1, Etimad Fadel1, Nahed Alowidi1

    Intelligent Automation & Soft Computing, Vol.35, No.1, pp. 297-320, 2023, DOI:10.32604/iasc.2023.026799

    Abstract Internet of Things (IoT) is the most widespread and fastest growing technology today. Due to the increasing of IoT devices connected to the Internet, the IoT is the most technology under security attacks. The IoT devices are not designed with security because they are resource constrained devices. Therefore, having an accurate IoT security system to detect security attacks is challenging. Intrusion Detection Systems (IDSs) using machine learning and deep learning techniques can detect security attacks accurately. This paper develops an IDS architecture based on Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) deep learning algorithms. We implement our model… More >

  • Open Access

    ARTICLE

    Imbalanced Classification in Diabetics Using Ensembled Machine Learning

    M. Sandeep Kumar1, Mohammad Zubair Khan2,*, Sukumar Rajendran1, Ayman Noor3, A. Stephen Dass1, J. Prabhu1

    CMC-Computers, Materials & Continua, Vol.72, No.3, pp. 4397-4409, 2022, DOI:10.32604/cmc.2022.025865

    Abstract Diabetics is one of the world’s most common diseases which are caused by continued high levels of blood sugar. The risk of diabetics can be lowered if the diabetic is found at the early stage. In recent days, several machine learning models were developed to predict the diabetic presence at an early stage. In this paper, we propose an embedded-based machine learning model that combines the split-vote method and instance duplication to leverage an imbalanced dataset called PIMA Indian to increase the prediction of diabetics. The proposed method uses both the concept of over-sampling and under-sampling along with model weighting… More >

Displaying 1-10 on page 1 of 20. Per Page