Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.070067
Special Issues
Table of Content

Open Access

ARTICLE

Advanced AI-Driven Cybersecurity Solutions: Intelligent Threat Detection, Explainability, and Adversarial Resilience

Kirubavathi Ganapathiyappan1,*, Kiruba Marimuthu Eswaramoorthy1, Abi Thangamuthu Shanthamani1, Aksaya Venugopal1, Asita Pon Bhavya Iyyappan1, Thilaga Manickam1, Ateeq Ur Rehman2,*, Habib Hamam3,4,5,6
1 Department of Mathematics, Amrita School of Physical Sciences, Amrita Vishwa Vidyapeetham, Coimbatore, 641112, India
2 School of Computing, Gachon University, Seongnam-si, 13120, Republic of Korea
3 Faculty of Engineering, Uni de Moncton, Moncton, NB E1A3E9, Canada
4 School of Electrical Engineering, University of Johannesburg, Johannesburg, 2006, South Africa
5 International Institute of Technology and Management (IITG), Av. Grandes Ecoles, Libreville, 1989, Gabon
6 College of Computer Science and Eng., University of Ha’il, Ha’il, 55476, Saudi Arabia
* Corresponding Author: Kirubavathi Ganapathiyappan. Email: email; Ateeq Ur Rehman. Email: email
(This article belongs to the Special Issue: Advances in Machine Learning and Artificial Intelligence for Intrusion Detection Systems)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.070067

Received 07 July 2025; Accepted 26 September 2025; Published online 27 October 2025

Abstract

The growing use of Portable Document Format (PDF) files across various sectors such as education, government, and business has inadvertently turned them into a major target for cyberattacks. Cybercriminals take advantage of the inherent flexibility and layered structure of PDFs to inject malicious content, often employing advanced obfuscation techniques to evade detection by traditional signature-based security systems. These conventional methods are no longer adequate, especially against sophisticated threats like zero-day exploits and polymorphic malware. In response to these challenges, this study introduces a machine learning-based detection framework specifically designed to combat such threats. Central to the proposed solution is a stacked ensemble learning model that combines the strengths of four high-performing classifiers: Random Forest (RF), Extreme Gradient Boosting (XGB), LightGBM (LGBM), and CatBoost (CB). These models operate in parallel as base learners, each capturing different aspects of the data. Their outputs are then refined by a Gradient Boosting Classifier (GBC), which serves as a meta-learner to enhance prediction accuracy. To ensure the model remains both efficient and effective, Principal Component Analysis (PCA) is applied to reduce feature dimensionality while preserving critical information necessary for malware classification. The model is trained and validated using the CIC-Evasive PDFMalware2022 dataset, which includes a wide range of both malicious and benign PDF samples. The results demonstrate that the framework achieves impressive performance, with 97.10% accuracy and a 97.39% F1-score, surpassing several existing techniques. To enhance trust and interpretability, the system incorporates Local Interpretable Model-agnostic Explanations (LIME), which provides user-friendly insights into the rationale behind each prediction. This research emphasizes how the integration of ensemble learning, feature reduction, and explainable AI can lead to a practical and scalable solution for detecting complex PDF-based threats. The proposed framework lays the foundation for the next generation of intelligent, resilient cybersecurity systems that can address ever-evolving attack strategies.

Keywords

PDF malware; ensemble learning; stacking model; cybersecurity; adversarial assaults
  • 465

    View

  • 211

    Download

  • 0

    Like

Share Link