Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (9)
  • Open Access


    Overfitting in Machine Learning: A Comparative Analysis of Decision Trees and Random Forests

    Erblin Halabaku, Eliot Bytyçi*

    Intelligent Automation & Soft Computing, Vol.39, No.6, pp. 987-1006, 2024, DOI:10.32604/iasc.2024.059429 - 30 December 2024

    Abstract Machine learning has emerged as a pivotal tool in deciphering and managing this excess of information in an era of abundant data. This paper presents a comprehensive analysis of machine learning algorithms, focusing on the structure and efficacy of random forests in mitigating overfitting—a prevalent issue in decision tree models. It also introduces a novel approach to enhancing decision tree performance through an optimized pruning method called Adaptive Cross-Validated Alpha CCP (ACV-CCP). This method refines traditional cost complexity pruning by streamlining the selection of the alpha parameter, leveraging cross-validation within the pruning process to achieve More >

  • Open Access


    Improving Thyroid Disorder Diagnosis via Ensemble Stacking and Bidirectional Feature Selection

    Muhammad Armghan Latif1, Zohaib Mushtaq2, Saad Arif3, Sara Rehman4, Muhammad Farrukh Qureshi5, Nagwan Abdel Samee6, Maali Alabdulhafith6,*, Yeong Hyeon Gu7, Mohammed A. Al-masni7

    CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 4225-4241, 2024, DOI:10.32604/cmc.2024.047621 - 26 March 2024

    Abstract Thyroid disorders represent a significant global health challenge with hypothyroidism and hyperthyroidism as two common conditions arising from dysfunction in the thyroid gland. Accurate and timely diagnosis of these disorders is crucial for effective treatment and patient care. This research introduces a comprehensive approach to improve the accuracy of thyroid disorder diagnosis through the integration of ensemble stacking and advanced feature selection techniques. Sequential forward feature selection, sequential backward feature elimination, and bidirectional feature elimination are investigated in this study. In ensemble learning, random forest, adaptive boosting, and bagging classifiers are employed. The effectiveness of… More >

  • Open Access


    A Sea Ice Recognition Algorithm in Bohai Based on Random Forest

    Tao Li1, Di Wu1, Rui Han2, Jinyue Xia3, Yongjun Ren4,*

    CMC-Computers, Materials & Continua, Vol.73, No.2, pp. 3721-3739, 2022, DOI:10.32604/cmc.2022.029619 - 16 June 2022

    Abstract As an important maritime hub, Bohai Sea Bay provides great convenience for shipping and suffers from sea ice disasters of different severity every winter, which greatly affects the socio-economic and development of the region. Therefore, this paper uses FY-4A (a weather satellite) data to study sea ice in the Bohai Sea. After processing the data for land removal and cloud detection, it combines multi-channel threshold method and adaptive threshold algorithm to realize the recognition of Bohai Sea ice under clear sky conditions. The random forests classification algorithm is introduced in sea ice identification, which can More >

  • Open Access


    Ensemble Nonlinear Support Vector Machine Approach for Predicting Chronic Kidney Diseases

    S. Prakash1,*, P. Vishnu Raja2, A. Baseera3, D. Mansoor Hussain4, V. R. Balaji5, K. Venkatachalam6

    Computer Systems Science and Engineering, Vol.42, No.3, pp. 1273-1287, 2022, DOI:10.32604/csse.2022.021784 - 08 February 2022

    Abstract Urban living in large modern cities exerts considerable adverse effects on health and thus increases the risk of contracting several chronic kidney diseases (CKD). The prediction of CKDs has become a major task in urbanized countries. The primary objective of this work is to introduce and develop predictive analytics for predicting CKDs. However, prediction of huge samples is becoming increasingly difficult. Meanwhile, MapReduce provides a feasible framework for programming predictive algorithms with map and reduce functions. The relatively simple programming interface helps solve problems in the scalability and efficiency of predictive learning algorithms. In the… More >

  • Open Access


    Estimating Daily Dew Point Temperature Based on Local and Cross-Station Meteorological Data Using CatBoost Algorithm

    Fuqi Yao1, Jinwei Sun1, Jianhua Dong2,*

    CMES-Computer Modeling in Engineering & Sciences, Vol.130, No.2, pp. 671-700, 2022, DOI:10.32604/cmes.2022.018450 - 13 December 2021

    Abstract Accurate estimation of dew point temperature (Tdew) plays a very important role in the fields of water resource management, agricultural engineering, climatology and energy utilization. However, there are few studies on the applicability of local Tdew algorithms at regional scales. This study evaluated the performance of a new machine learning algorithm, i.e., gradient boosting on decision trees with categorical features support (CatBoost) to estimate daily Tdew using limited local and cross-station meteorological data. The random forests (RF) algorithm was also assessed for comparison. Daily meteorological data from 2016 to 2019, including maximum, minimum and average temperature (Tmax, TminMore >

  • Open Access


    Random Forests Algorithm Based Duplicate Detection in On-Site Programming Big Data Environment

    Qianqian Li1, Meng Li2, Lei Guo3,*, Zhen Zhang4

    Journal of Information Hiding and Privacy Protection, Vol.2, No.4, pp. 199-205, 2020, DOI:10.32604/jihpp.2020.016299 - 07 January 2021

    Abstract On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time, complexity and high-difficulty for processing. Therefore, data cleaning is essential for on-site programming big data. Duplicate data detection is an important step in data cleaning, which can save storage resources and enhance data consistency. Due to the insufficiency in traditional Sorted Neighborhood Method (SNM) and the difficulty of high-dimensional data detection, an optimized algorithm based on random forests with the dynamic and adaptive window size is proposed. The efficiency of the algorithm can be More >

  • Open Access


    MOOC Learner’s Final Grade Prediction Based on an Improved Random Forests Method

    Yuqing Yang1, 3, Peng Fu2, *, Xiaojiang Yang1, 4, Hong Hong5, Dequn Zhou1

    CMC-Computers, Materials & Continua, Vol.65, No.3, pp. 2413-2423, 2020, DOI:10.32604/cmc.2020.011881 - 16 September 2020

    Abstract Massive Open Online Course (MOOC) has become a popular way of online learning used across the world by millions of people. Meanwhile, a vast amount of information has been collected from the MOOC learners and institutions. Based on the educational data, a lot of researches have been investigated for the prediction of the MOOC learner’s final grade. However, there are still two problems in this research field. The first problem is how to select the most proper features to improve the prediction accuracy, and the second problem is how to use or modify the data… More >

  • Open Access


    Classification Algorithm Optimization Based on Triple-GAN

    Kun Fang1, 2, Jianquan Ouyang1, *

    Journal on Artificial Intelligence, Vol.2, No.1, pp. 1-15, 2020, DOI:10.32604/jai.2020.09738 - 15 July 2020

    Abstract Generating an Adversarial network (GAN) has shown great development prospects in image generation and semi-supervised learning and has evolved into TripleGAN. However, there are still two problems that need to be solved in Triple-GAN: based on the KL divergence distribution structure, gradients are easy to disappear and training instability occurs. Since Triple-GAN tags the samples manually, the manual marking workload is too large. Marked uneven and so on. This article builds on this improved Triple-GAN model (Improved Triple-GAN), which uses Random Forests to classify real samples, automate tagging of leaf nodes, and use Least Squares More >

  • Open Access


    A Privacy-Preserving Algorithm for Clinical Decision-Support Systems Using Random Forest

    Alia Alabdulkarim1, Mznah Al-Rodhaan2, Yuan Tian*,3, Abdullah Al-Dhelaan2

    CMC-Computers, Materials & Continua, Vol.58, No.3, pp. 585-601, 2019, DOI:10.32604/cmc.2019.05637

    Abstract Clinical decision-support systems are technology-based tools that help healthcare providers enhance the quality of their services to satisfy their patients and earn their trust. These systems are used to improve physicians’ diagnostic processes in terms of speed and accuracy. Using data-mining techniques, a clinical decision support system builds a classification model from hospital’s dataset for diagnosing new patients using their symptoms. In this work, we propose a privacy-preserving clinical decision-support system that uses a privacy-preserving random forest algorithm to diagnose new symptoms without disclosing patients’ information and exposing them to cyber and network attacks. Solving More >

Displaying 1-10 on page 1 of 9. Per Page