Open Access iconOpen Access

ARTICLE

crossmark

A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning

Nimra Bari1, Tahir Saleem2, Munam Shah3, Abdulmohsen Algarni4, Asma Patel5,*, Insaf Ullah6,*

1 Faculty of Computer Science and Engineering, GIK Institute of Engineering Sciences and Technology, Topi, 23640, Pakistan
2 Department of Computing, Hamdard University Islamabad Campus, Islamabad, 44000, Pakistan
3 Department of Computer Networks and Communication, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa, 31982, Saudi Arabia
4 Department of Computer Science, King Khalid University, Abha, 61421, Saudi Arabia
5 Department of Operations and Information Management, Aston Business School, Aston University, Birmingham, B4 7ET, UK
6 Institute for Analytics and Data Science, University of Essex, Colchester, CO4 3SQ, UK

* Corresponding Authors: Asma Patel. Email: email; Insaf Ullah. Email: email

Computer Modeling in Engineering & Sciences 2025, 145(1), 1167-1187. https://doi.org/10.32604/cmes.2025.070311

Abstract

Today, phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers, passwords, and usernames. We can find several anti-phishing solutions, such as heuristic detection, virtual similarity detection, black and white lists, and machine learning (ML). However, phishing attempts remain a problem, and establishing an effective anti-phishing strategy is a work in progress. Furthermore, while most anti-phishing solutions achieve the highest levels of accuracy on a given dataset, their methods suffer from an increased number of false positives. These methods are ineffective against zero-hour attacks. Phishing sites with a high False Positive Rate (FPR) are considered genuine because they can cause people to lose a lot of money by visiting them. Feature selection is critical when developing phishing detection strategies. Good feature selection helps improve accuracy; however, duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm. Therefore, a combination of filter-based feature selection methods is proposed to detect phishing attacks, including constant feature removal, duplicate feature removal, quasi-feature removal, correlated feature removal, mutual information extraction, and Analysis of Variance (ANOVA) testing. The technique has been tested with different Machine Learning classifiers: Random Forest, Artificial Neural Network (ANN), Ada-Boost, Extreme Gradient Boosting (XGBoost), Logistic Regression, Decision Trees, Gradient Boosting Classifiers, Support Vector Machine (SVM), and two types of ensemble models, stacking and majority voting to gain A low false positive rate is achieved. Stacked ensemble classifiers (gradient boosting, random forest, support vector machine) achieve 1.31% FPR and 98.17% accuracy on Dataset 1, 2.81% FPR and Dataset 3 shows 2.81% FPR and 97.61% accuracy, while Dataset 2 shows 3.47% FPR and 96.47% accuracy.

Keywords

Phishing detection; feature selection; phishing detection; stacking ensemble; machine learning; phishing URL

Cite This Article

APA Style
Bari, N., Saleem, T., Shah, M., Algarni, A., Patel, A. et al. (2025). A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning. Computer Modeling in Engineering & Sciences, 145(1), 1167–1187. https://doi.org/10.32604/cmes.2025.070311
Vancouver Style
Bari N, Saleem T, Shah M, Algarni A, Patel A, Ullah I. A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning. Comput Model Eng Sci. 2025;145(1):1167–1187. https://doi.org/10.32604/cmes.2025.070311
IEEE Style
N. Bari, T. Saleem, M. Shah, A. Algarni, A. Patel, and I. Ullah, “A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning,” Comput. Model. Eng. Sci., vol. 145, no. 1, pp. 1167–1187, 2025. https://doi.org/10.32604/cmes.2025.070311



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 808

    View

  • 214

    Download

  • 0

    Like

Share Link