Open Access iconOpen Access

ARTICLE

crossmark

Impact of Data Processing Techniques on AI Models for Attack-Based Imbalanced and Encrypted Traffic within IoT Environments

Yeasul Kim1, Chaeeun Won1, Hwankuk Kim2,*

1 Department of Cyber Security, Kookmin University, Seoul, 02707, Republic of Korea
2 Department of Information Security, Cryptography and Mathematics, Kookmin University, Seoul, 02707, Republic of Korea

* Corresponding Author: Hwankuk Kim. Email: email

(This article belongs to the Special Issue: Intelligence and Security Enhancement for Internet of Things)

Computers, Materials & Continua 2026, 86(1), 1-28. https://doi.org/10.32604/cmc.2025.069608

Abstract

With the increasing emphasis on personal information protection, encryption through security protocols has emerged as a critical requirement in data transmission and reception processes. Nevertheless, IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices, spanning a range of devices from non-encrypted ones to fully encrypted ones. Given the limited visibility into payloads in this context, this study investigates AI-based attack detection methods that leverage encrypted traffic metadata, eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices. Using the UNSW-NB15 and CICIoT-2023 dataset, encrypted and unencrypted traffic were categorized according to security protocol, and AI-based intrusion detection experiments were conducted for each traffic type based on metadata. To mitigate the problem of class imbalance, eight different data sampling techniques were applied. The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning (DL) models from various perspectives. The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic. In the UNSW-NB15 dataset, the f1-score of encrypted traffic was approximately 0.98, which is 4.3% higher than that of unencrypted traffic (approximately 0.94). In addition, analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43, indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance. Furthermore, when data sampling techniques were applied to encrypted traffic, the recall in the UNSW-NB15 (Encrypted) dataset improved by up to 23.0%, and in the CICIoT-2023 (Encrypted) dataset by 20.26%, showing a similar level of improvement. Notably, in CICIoT-2023, f1-score and Receiver Operation Characteristic-Area Under the Curve (ROC-AUC) increased by 59.0% and 55.94%, respectively. These results suggest that data sampling can have a positive effect even in encrypted environments. However, the extent of the improvement may vary depending on data quality, model architecture, and sampling strategy.

Keywords

Encrypted traffic; attack detection; data sampling technique; AI-based detection; IoT environment

Cite This Article

APA Style
Kim, Y., Won, C., Kim, H. (2026). Impact of Data Processing Techniques on AI Models for Attack-Based Imbalanced and Encrypted Traffic within IoT Environments. Computers, Materials & Continua, 86(1), 1–28. https://doi.org/10.32604/cmc.2025.069608
Vancouver Style
Kim Y, Won C, Kim H. Impact of Data Processing Techniques on AI Models for Attack-Based Imbalanced and Encrypted Traffic within IoT Environments. Comput Mater Contin. 2026;86(1):1–28. https://doi.org/10.32604/cmc.2025.069608
IEEE Style
Y. Kim, C. Won, and H. Kim, “Impact of Data Processing Techniques on AI Models for Attack-Based Imbalanced and Encrypted Traffic within IoT Environments,” Comput. Mater. Contin., vol. 86, no. 1, pp. 1–28, 2026. https://doi.org/10.32604/cmc.2025.069608



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 746

    View

  • 312

    Download

  • 0

    Like

Share Link