Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.073029
Special Issues
Table of Content

Open Access

ARTICLE

Mitigating Adversarial Obfuscation in Named Entity Recognition with Robust SecureBERT Finetuning

Nouman Ahmad1,*, Changsheng Zhang1, Uroosa Sehar2,3,4
1 School of Software Engineering, Northeastern University, Shenyang, 110819, China
2 College of Information Science and Engineering, Shaoyang University, Shaoyang, 422000, China
3 Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang, 422000, China
4 School of Advanced Integrated Technology, Shenzhen Institute of Advanced Technology Chinese Academy of Sciences, Shenzhen, 518000, China
* Corresponding Author: Nouman Ahmad. Email: email
(This article belongs to the Special Issue: Utilizing and Securing Large Language Models for Cybersecurity and Beyond)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.073029

Received 09 September 2025; Accepted 18 November 2025; Published online 12 December 2025

Abstract

Although Named Entity Recognition (NER) in cybersecurity has historically concentrated on threat intelligence, vital security data can be found in a variety of sources, such as open-source intelligence and unprocessed tool outputs. When dealing with technical language, the coexistence of structured and unstructured data poses serious issues for traditional BERT-based techniques. We introduce a three-phase approach for improved NER in multi-source cybersecurity data that makes use of large language models (LLMs). To ensure thorough entity coverage, our method starts with an identification module that uses dynamic prompting techniques. To lessen hallucinations, the extraction module uses confidence-based self-assessment and cross-checking using regex validation. The tagging module links to knowledge bases for contextual validation and uses SecureBERT in conjunction with conditional random fields to detect entity boundaries precisely. Our framework creates efficient natural language segments by utilizing decoder-based LLMs with 10B parameters. When compared to baseline SecureBERT implementations, evaluation across four cybersecurity data sources shows notable gains, with a 9.4%–25.21% greater recall and a 6.38%–17.3% better F1-score. Our refined model matches larger models and achieves 2.6%–4.9% better F1-score for technical phrase recognition than the state-of-the-art alternatives Claude 3.5 Sonnet, Llama3-8B, and Mixtral-7B. The three-stage architecture identification-extraction-tagging pipeline tackles important cybersecurity NER issues. Through effective architectures, these developments preserve deployability while setting a new standard for entity extraction in challenging security scenarios. The findings show how specific enhancements in hybrid recognition, validation procedures, and prompt engineering raise NER performance above monolithic LLM approaches in cybersecurity applications, especially for technical entity extraction from heterogeneous sources where conventional techniques fall short. Because of its modular nature, the framework can be upgraded at the component level as new methods are developed.

Keywords

Information extraction; large language models; NER; open-source intelligence; security automation
  • 81

    View

  • 9

    Download

  • 0

    Like

Share Link