Open Access iconOpen Access

ARTICLE

A Hybrid CNN-XGBoost Framework for Phishing Email Detection Using Statistical and Semantic Features

Lin-Hui Liu1, Dong-Jie Liu1,*, Yin-Yan Zhang1, Xiao-Bo Jin2, Xiu-Cheng Wu3, Guang-Gang Geng1

1 College of Cyber Security, Jinan University, Guangzhou, China
2 Department of Intelligent Science, Xi’an Jiaotong-Liverpool University, Suzhou, China
3 Coremail Technology Co. Ltd., Guangzhou, China

* Corresponding Author: Dong-Jie Liu. Email: email

Computers, Materials & Continua 2026, 87(2), 58 https://doi.org/10.32604/cmc.2026.074253

Abstract

Phishing email detection represents a critical research challenge in cybersecurity. To address this, this paper proposes a novel Double-S (statistical-semantic) feature model based on three core entities involved in email communication: the sender, recipient, and email content. We employ strategic game theory to analyze the offensive strategies of phishing attackers and defensive strategies of protectors, extracting statistical features from these entities. We also leverage the Qwen large language model to excavate implicit semantic features (e.g., emotional manipulation and social engineering tactics) from email content. By integrating statistical and semantic features, our model achieves a robust representation of phishing emails. We introduce a hybrid detection model that integrates a convolutional neural network (CNN) module with the XGBoost (Extreme Gradient Boosting) classifier, effectively capturing local correlations in high-dimensional features. Experimental results on real-world phishing email datasets demonstrate the superiority of our approach, achieving an F1-score of 0.9587, precision of 0.9591, and recall of 0.9583, representing improvements of 1.3%–10.6% compared to state-of-the-art methods.

Keywords

Phishing email detection; strategic game theory; Double-S feature model; Qwen large language model; XGBoost; convolutional neural network

Cite This Article

APA Style
Liu, L., Liu, D., Zhang, Y., Jin, X., Wu, X. et al. (2026). A Hybrid CNN-XGBoost Framework for Phishing Email Detection Using Statistical and Semantic Features. Computers, Materials & Continua, 87(2), 58. https://doi.org/10.32604/cmc.2026.074253
Vancouver Style
Liu L, Liu D, Zhang Y, Jin X, Wu X, Geng G. A Hybrid CNN-XGBoost Framework for Phishing Email Detection Using Statistical and Semantic Features. Comput Mater Contin. 2026;87(2):58. https://doi.org/10.32604/cmc.2026.074253
IEEE Style
L. Liu, D. Liu, Y. Zhang, X. Jin, X. Wu, and G. Geng, “A Hybrid CNN-XGBoost Framework for Phishing Email Detection Using Statistical and Semantic Features,” Comput. Mater. Contin., vol. 87, no. 2, pp. 58, 2026. https://doi.org/10.32604/cmc.2026.074253



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 445

    View

  • 139

    Download

  • 0

    Like

Share Link