A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices

Chengqi Liu; Yongtao Li; Weiping Zou; Deyu Lin

doi:10.32604/cmc.2025.074975

Open Access icon Open Access

ARTICLE

A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices

Chengqi Liu¹, Yongtao Li², Weiping Zou^3,*, Deyu Lin^4,5,*

1 Network and Information Center, Nanchang University, No. 999 of Xuefu Road, Nanchang, 330031, China
2 School of Mathematics and Computer Sciences, Nanchang University, No. 999 of Xuefu Road, Nanchang, 330031, China
3 School of Artificial Intelligence, Wenzhou Polytechnic, Chashan Higher Education Park, Ouhai District, Wenzhou, 325035, China
4 School of Software, Nanchang University, Nanchang, 330031, China
5 School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore

* Corresponding Authors: Weiping Zou. Email: email ; Deyu Lin. Email: email

Computers, Materials & Continua 2026, 87(1), 85 https://doi.org/10.32604/cmc.2025.074975

Received 22 October 2025; Accepted 16 December 2025; Issue published 10 February 2026

Abstract

With the large-scale deployment of the Internet of Things (IoT) devices, their weak security mechanisms make them prime targets for malware attacks. Attackers often use Domain Generation Algorithm (DGA) to generate random domain names, hiding the real IP of Command and Control (C&C) servers to build botnets. Due to the randomness and dynamics of DGA, traditional methods struggle to detect them accurately, increasing the difficulty of network defense. This paper proposes a lightweight DGA detection model based on knowledge distillation for resource-constrained IoT environments. Specifically, a teacher model combining CharacterBERT, a bidirectional long short-term memory (BiLSTM) network, and attention mechanism (ATT) is constructed: it extracts character-level semantic features via CharacterBERT, captures sequence dependencies with the BiLSTM, and integrates the ATT for key feature weighting, forming multi-granularity feature fusion. An improved knowledge distillation approach transfers the teacher model’s learned knowledge to the simplified DistilBERT student model. Experimental results show the teacher model achieves 98.68% detection accuracy. The student model maintains slightly improved accuracy while significantly compressing parameters to approximately 38.4% of the teacher model’s scale, greatly reducing computational overhead for IoT deployment.

Keywords

IoT security; DGA detection; knowledge distillation; lightweight model; edge computing

Cite This Article

APA Style

Liu, C., Li, Y., Zou, W., Lin, D. (2026). A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices. Computers, Materials & Continua, 87(1), 85. https://doi.org/10.32604/cmc.2025.074975

Vancouver Style

Liu C, Li Y, Zou W, Lin D. A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices. Comput Mater Contin. 2026;87(1):85. https://doi.org/10.32604/cmc.2025.074975

IEEE Style

C. Liu, Y. Li, W. Zou, and D. Lin, “A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices,” Comput. Mater. Contin., vol. 87, no. 1, pp. 85, 2026. https://doi.org/10.32604/cmc.2025.074975

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices

Abstract

Keywords

Cite This Article

266

54

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link