Towards Robust Malware Detection with a Multiclass Dataset for Intelligent Learning

Amjad Hussain^1,*, Ayesha Saadia^2,*, Chihhsiong Shih³, Nazish Nawaz², Amir H. Gandomi^4,*, Khursheed Aurangzeb⁵
1 Department of Cyber Security, Main Campus, Air University, Islamabad, Pakistan
2 Department of Computer Science, Main Campus, Air University, Islamabad, Pakistan
3 Department of Computer Science, Tunghai University, Taichung City, Taiwan
4 Faculty of Engineering & Information Technology, University of Technology, Sydney, NSW, Australia
5 Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
* Corresponding Author: Amjad Hussain. Email: 220324@ email ; Ayesha Saadia. Email: email ; Amir H. Gandomi. Email: email

Computer Modeling in Engineering & Sciences https://doi.org/10.32604/cmes.2026.078451

Received 31 December 2025; Accepted 19 March 2026; Published online 14 May 2026

Download PDF

Abstract

Malware has evolved from the early Creeper virus into highly sophisticated and organized cyber threats. Over time, it grew in sophistication, adopting advanced techniques, stealth tactics, and autonomous propagation. Modern malware leverages encryption, obfuscation, zero-day exploits, and AI-assisted techniques to conduct stealthy and persistent attacks. Classification of its exact family is the end goal to defend and mitigate the latest attacks. Researchers have contributed significantly and introduced many techniques to tackle malware threats. Binary detection is performed at a large scale, but very little in multi-class classification. In this research, a hybrid technique is proposed by combining a sandbox with AI models to extract hidden patterns and classify its category and family with high accuracy. A dataset (AU-PEMAL-2025) is prepared, which includes 10,839 records of 26 malware families. Five ML and three DL models are trained on the newly created dataset to validate its effectiveness. The ML classifiers achieved the highest accuracies of 0.9945, 0.9788, and 0.9485, while the DL models achieved 0.9932, 0.9591, and 0.9286 accuracies with minimal losses in detection and multi-class classification of category and family, respectively. Our findings reveal that the proposed approach can efficiently detect the obfuscated malware variants and safeguard organizations from unseen malware threats.