Open Access
ARTICLE
Impact of Data Processing Techniques on AI Models for Attack-Based Imbalanced and Encrypted Traffic within IoT Environments
1 Department of Cyber Security, Kookmin University, Seoul, 02707, Republic of Korea
2 Department of Information Security, Cryptography and Mathematics, Kookmin University, Seoul, 02707, Republic of Korea
* Corresponding Author: Hwankuk Kim. Email:
(This article belongs to the Special Issue: Intelligence and Security Enhancement for Internet of Things)
Computers, Materials & Continua 2026, 86(1), 1-28. https://doi.org/10.32604/cmc.2025.069608
Received 26 June 2025; Accepted 19 August 2025; Issue published 10 November 2025
Abstract
With the increasing emphasis on personal information protection, encryption through security protocols has emerged as a critical requirement in data transmission and reception processes. Nevertheless, IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices, spanning a range of devices from non-encrypted ones to fully encrypted ones. Given the limited visibility into payloads in this context, this study investigates AI-based attack detection methods that leverage encrypted traffic metadata, eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices. Using the UNSW-NB15 and CICIoT-2023 dataset, encrypted and unencrypted traffic were categorized according to security protocol, and AI-based intrusion detection experiments were conducted for each traffic type based on metadata. To mitigate the problem of class imbalance, eight different data sampling techniques were applied. The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning (DL) models from various perspectives. The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic. In the UNSW-NB15 dataset, the f1-score of encrypted traffic was approximately 0.98, which is 4.3% higher than that of unencrypted traffic (approximately 0.94). In addition, analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43, indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance. Furthermore, when data sampling techniques were applied to encrypted traffic, the recall in the UNSW-NB15 (Encrypted) dataset improved by up to 23.0%, and in the CICIoT-2023 (Encrypted) dataset by 20.26%, showing a similar level of improvement. Notably, in CICIoT-2023, f1-score and Receiver Operation Characteristic-Area Under the Curve (ROC-AUC) increased by 59.0% and 55.94%, respectively. These results suggest that data sampling can have a positive effect even in encrypted environments. However, the extent of the improvement may vary depending on data quality, model architecture, and sampling strategy.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools