A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices

Chengqi Liu¹, Yongtao Li², Weiping Zou^3,*, Deyu Lin^4,5,*
1 Network and Information Center, Nanchang University, No. 999 of Xuefu Road, Nanchang, 330031, China
2 School of Mathematics and Computer Sciences, Nanchang University, No. 999 of Xuefu Road, Nanchang, 330031, China
3 School of Artificial Intelligence, Wenzhou Polytechnic, Chashan Higher Education Park, Ouhai District, Wenzhou, 325035, China
4 School of Software, Nanchang University, Nanchang, 330031, China
5 School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore
* Corresponding Author: Weiping Zou. Email: email ; Deyu Lin. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.074975

Received 22 October 2025; Accepted 16 December 2025; Published online 04 January 2026

Download PDF

Abstract

With the large-scale deployment of the Internet of Things (IoT) devices, their weak security mechanisms make them prime targets for malware attacks. Attackers often use Domain Generation Algorithm (DGA) to generate random domain names, hiding the real IP of Command and Control (C&C) servers to build botnets. Due to the randomness and dynamics of DGA, traditional methods struggle to detect them accurately, increasing the difficulty of network defense. This paper proposes a lightweight DGA detection model based on knowledge distillation for resource-constrained IoT environments. Specifically, a teacher model combining CharacterBERT, a bidirectional long short-term memory (BiLSTM) network, and attention mechanism (ATT) is constructed: it extracts character-level semantic features via CharacterBERT, captures sequence dependencies with the BiLSTM, and integrates the ATT for key feature weighting, forming multi-granularity feature fusion. An improved knowledge distillation approach transfers the teacher model’s learned knowledge to the simplified DistilBERT student model. Experimental results show the teacher model achieves 98.68% detection accuracy. The student model maintains slightly improved accuracy while significantly compressing parameters to approximately 38.4% of the teacher model’s scale, greatly reducing computational overhead for IoT deployment.

Keywords

IoT security; DGA detection; knowledge distillation; lightweight model; edge computing

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

248

View
47

Download
0

Like

Optimization Scheme of Trusted Task Offloading in IIoT Scenario Based on DQN
Xiaojuan Wang, Zikui Lu, Siyuan...
Multi-Zone-Wise Blockchain Based Intrusion Detection and Prevention System for IoT Environment
Salaheddine Kably, Tajeddine Benbarrad,...
Identity-Based Edge Computing Anonymous Authentication Protocol
Naixin Kang, Zhenhu Ning, Shiqiang...
Transparent Access to Heterogeneous IoT Based on Virtual Resources
Wenquan Jin, Sunhwan Lim, Young-Ho...
Resource Management in UAV Enabled MEC Networks
Muhammad Abrar, Ziyad M. Almohaimeed,...

All issues

Online First

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

A Knowledge-Distilled CharacterBERT-BiLSTM-ATT Framework for Lightweight DGA Detection in IoT Devices

Abstract

Keywords

248

47

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link