Open Access iconOpen Access

ARTICLE

A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices

Chengqi Liu1, Yongtao Li2, Weiping Zou3,*, Deyu Lin4,5,*

1 Network and Information Center, Nanchang University, No. 999 of Xuefu Road, Nanchang, 330031, China
2 School of Mathematics and Computer Sciences, Nanchang University, No. 999 of Xuefu Road, Nanchang, 330031, China
3 School of Artificial Intelligence, Wenzhou Polytechnic, Chashan Higher Education Park, Ouhai District, Wenzhou, 325035, China
4 School of Software, Nanchang University, Nanchang, 330031, China
5 School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore

* Corresponding Authors: Weiping Zou. Email: email; Deyu Lin. Email: email

Computers, Materials & Continua 2026, 87(1), 85 https://doi.org/10.32604/cmc.2025.074975

Abstract

With the large-scale deployment of the Internet of Things (IoT) devices, their weak security mechanisms make them prime targets for malware attacks. Attackers often use Domain Generation Algorithm (DGA) to generate random domain names, hiding the real IP of Command and Control (C&C) servers to build botnets. Due to the randomness and dynamics of DGA, traditional methods struggle to detect them accurately, increasing the difficulty of network defense. This paper proposes a lightweight DGA detection model based on knowledge distillation for resource-constrained IoT environments. Specifically, a teacher model combining CharacterBERT, a bidirectional long short-term memory (BiLSTM) network, and attention mechanism (ATT) is constructed: it extracts character-level semantic features via CharacterBERT, captures sequence dependencies with the BiLSTM, and integrates the ATT for key feature weighting, forming multi-granularity feature fusion. An improved knowledge distillation approach transfers the teacher model’s learned knowledge to the simplified DistilBERT student model. Experimental results show the teacher model achieves 98.68% detection accuracy. The student model maintains slightly improved accuracy while significantly compressing parameters to approximately 38.4% of the teacher model’s scale, greatly reducing computational overhead for IoT deployment.

Keywords

IoT security; DGA detection; knowledge distillation; lightweight model; edge computing

Cite This Article

APA Style
Liu, C., Li, Y., Zou, W., Lin, D. (2026). A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices. Computers, Materials & Continua, 87(1), 85. https://doi.org/10.32604/cmc.2025.074975
Vancouver Style
Liu C, Li Y, Zou W, Lin D. A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices. Comput Mater Contin. 2026;87(1):85. https://doi.org/10.32604/cmc.2025.074975
IEEE Style
C. Liu, Y. Li, W. Zou, and D. Lin, “A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices,” Comput. Mater. Contin., vol. 87, no. 1, pp. 85, 2026. https://doi.org/10.32604/cmc.2025.074975



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 266

    View

  • 54

    Download

  • 0

    Like

Share Link