Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.075161
Special Issues
Table of Content

Open Access

ARTICLE

A Comparative Analysis of Machine Learning Algorithms for Spam and Phishing URL Classification

Tran Minh Bao1, Kumar Shashvat2, Nguyen Gia Nhu3,*, Dac-Nhuong Le4
1 HCMC University of Industry and Trade, Ho Chi Minh, 10000, Vietnam
2 Amity School of Engineering & Technology, AMITY University, Bengaluru, 226028, India
3 School of Computer Science, Duy Tan University, Danang, 55000, Vietnam
4 Haiphong University, Haiphong, 05000, Vietnam
* Corresponding Author: Nguyen Gia Nhu. Email: email" />, email" />
(This article belongs to the Special Issue: Artificial Intelligence Methods and Techniques to Cybersecurity)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.075161

Received 26 October 2025; Accepted 16 December 2025; Published online 09 January 2026

Abstract

The sudden growth of harmful web pages, including spam and phishing URLs, poses a greater threat to global cybersecurity than ever before. These URLs are commonly utilised to trick people into divulging confidential details or to stealthily deploy malware. To address this issue, we aimed to assess the efficiency of popular machine learning and neural network models in identifying such harmful links. To serve our research needs, we employed two different datasets: the PhiUSIIL dataset, which is specifically designed to address phishing URL detection, and another dataset developed to uncover spam links by examining the wording and structure of every URL. Our strategy was to train and evaluate four classification models, namely Random Forest, Support Vector Machine (SVM), Naive Bayes, and Artificial Neural Networks (ANN), under two different feature engineering approaches: statistical text-based analysis and heuristic-based structural features. The results are in, and they are stunning: Random Forest and ANN models were always the best. During our research, we achieved some outstanding results. On the PhiUSIIL phishing dataset, the model achieved an accuracy of 99.99%, and on the spam dataset, it attained an accuracy of 99.62%. Studies surpass any previously reported findings, firmly establishing the efficacy of machine learning and neural networks in detecting malicious URLs. Not only does this work reinforce the superiority of these in-demand models, but it also sets a high bar for subsequent research and development in the field. In general, this provides the direction for building smarter, faster, and more precise tools that can spot online threats as they develop.

Keywords

Web security; phishing; malicious URL; DOM analysis; transformer; GNN; evaluation; adversarial ML; LLM safety
  • 79

    View

  • 19

    Download

  • 0

    Like

Share Link