Open Access iconOpen Access

ARTICLE

RoBGP: A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer

Xiaohui Cui1,2,#, Chao Song1,2,#, Dongmei Li1,2,*, Xiaolong Qu1,2, Jiao Long1,2, Yu Yang1,2, Hanchao Zhang3

1 School of Information Science and Technology, Beijing Forestry University, Beijing, 100083, China
2 Engineering Research Center for Forestry-Oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing, 100083, China
3 Division of Biostatistics, Department of Population Health, Grossman School of Medicine, New York University, New York, 10016, USA

* Corresponding Author: Dongmei Li. Email: email
# These authors contribute equally to this work

(This article belongs to the Special Issue: Transfroming from Data to Knowledge and Applications in Intelligent Systems)

Computers, Materials & Continua 2024, 78(3), 3603-3618. https://doi.org/10.32604/cmc.2024.047321

Abstract

Named Entity Recognition (NER) stands as a fundamental task within the field of biomedical text mining, aiming to extract specific types of entities such as genes, proteins, and diseases from complex biomedical texts and categorize them into predefined entity types. This process can provide basic support for the automatic construction of knowledge bases. In contrast to general texts, biomedical texts frequently contain numerous nested entities and local dependencies among these entities, presenting significant challenges to prevailing NER models. To address these issues, we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer (RoBGP). Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors. It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information, effectively addressing the issue of long-distance dependencies. Furthermore, the Global Pointer model is employed to comprehensively recognize all nested entities in the text. We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models. This research confirms the effectiveness of RoBGP in Chinese biomedical NER, providing reliable technical support for biomedical information extraction and knowledge base construction.

Keywords


Cite This Article

X. Cui, C. Song, D. Li, X. Qu, J. Long et al., "Robgp: a chinese nested biomedical named entity recognition model based on roberta and global pointer," Computers, Materials & Continua, vol. 78, no.3, pp. 3603–3618, 2024. https://doi.org/10.32604/cmc.2024.047321



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 317

    View

  • 122

    Download

  • 1

    Like

Share Link