Open Access
ARTICLE
Explainable Machine Learning for Phishing Detection: Bridging Technical Efficacy and Legal Accountability in Cyberspace Security
1 School of Cyber Security and Information Law, Chongqing University of Posts and Telecommunications, No. 2, Chongwen Road, Nan’an District, Chongqing, 400065, China
2 School of Mechanical and Vehicle Engineering, Chongqing University, Shapingba District, Chongqing, 400044, China
3 School of Information and Communication Engineering, Chongqing University of Posts and Telecommunications, No. 2, Chongwen Road, Nan’an District, Chongqing, 400065, China
4 School of Transportation and Civil Engineering, Nantong University, No. 9, Seyuan Road, Nantong, 226019, China
5 School of Artificial Intelligence and Computer Science, Nantong University, No. 9, Seyuan Road, Nantong, 226019, China
* Corresponding Author: MD Hamid Borkot Tulla. Email:
Journal of Cyber Security 2025, 7, 675-691. https://doi.org/10.32604/jcs.2025.074737
Received 16 October 2025; Accepted 28 November 2025; Issue published 24 December 2025
Abstract
Phishing is considered one of the most widespread cybercrimes due to the fact that it combines both technical and human vulnerabilities with the intention of stealing sensitive information. Traditional blacklist and heuristic-based defenses fail to detect such emerging attack patterns; hence, intelligent and transparent detection systems are needed. This paper proposes an explainable machine learning framework that integrates predictive performance with regulatory accountability. Four models were trained and tested on a balanced dataset of 10,000 URLs, comprising 5000 phishing and 5000 legitimate samples, each characterized by 48 lexical and content-based features: Decision Tree, XGBoost, Logistic Regression, and Random Forest. Among them, Random Forest achieved the best balance between interpretability and accuracy at 98.55%. Model explainability is developed through SHAP (SHapley Additive Explanations) and LIME (Local Interpretable Model-Agnostic Explanations), offering both global and local transparency into model decisions. SHAP identifies some key indicators of phishing, including PctExtHyperlinks, PctExtNullSelfRedirectHyperlinksRT, and FrequentDomainNameMismatch, while LIME provides individual interpretability on URL classification. These results on interpretability are further mapped onto corresponding legal frameworks, including EU General Data Protection Regulation (GDPR), China’s Personal Information Protection Law (PIPL), the EU Artificial Intelligence Act (AI Act), and the U.S. Federal Trade Commission Act (FTC Act), by linking algorithmic reasoning to the basic principles of fairness, transparency, and accountability. These results demonstrate that explainable ensemble models can achieve high accuracy while also ensuring legal compliance. Future work will go on to extend this approach into multimodal phishing detection and its validation across jurisdictions.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools