Open Access
ARTICLE
Backdoor Malware Detection in Industrial IoT Using Machine Learning
1 Department of Computer Science, CECOS University of IT and Emerging Sciences, Peshawar, 25000, Pakistan
2 Department of Environmental Sciences, Informatics and Statistics, Ca’ Foscari University of Venice, Via Torino, Venice, 155, Italy
3 Center for Cybersecurity, Bruno Kessler Foundation, Trento, 38123, Italy
4 Faculty of Computer Science, National University of Computer and Emerging Sciences (NUCES-FAST), Islamabad, 44000, Pakistan
* Corresponding Author: Tahir Ahmad. Email:
Computers, Materials & Continua 2024, 81(3), 4691-4705. https://doi.org/10.32604/cmc.2024.057648
Received 23 August 2024; Accepted 19 November 2024; Issue published 19 December 2024
Abstract
With the ever-increasing continuous adoption of Industrial Internet of Things (IoT) technologies, security concerns have grown exponentially, especially regarding securing critical infrastructures. This is primarily due to the potential for backdoors to provide unauthorized access, disrupt operations, and compromise sensitive data. Backdoors pose a significant threat to the integrity and security of Industrial IoT setups by exploiting vulnerabilities and bypassing standard authentication processes. Hence its detection becomes of paramount importance. This paper not only investigates the capabilities of Machine Learning (ML) models in identifying backdoor malware but also evaluates the impact of balancing the dataset via resampling techniques, including Synthetic Minority Oversampling Technique (SMOTE), Synthetic Data Vault (SDV), and Conditional Tabular Generative Adversarial Network (CTGAN), and feature reduction such as Pearson correlation coefficient, on the performance of the ML models. Experimental evaluation on the CCCS-CIC-AndMal-2020 dataset demonstrates that the Random Forest (RF) classifier generated an optimal model with 99.98% accuracy when using a balanced dataset created by SMOTE. Additionally, the training and testing time was reduced by approximately 50% when switching from the full feature set to a reduced feature set, without significant performance loss.Keywords
Cite This Article

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.