Open Access
ARTICLE
Corpus of Carbonate Platforms with Lexical Annotations for Named Entity Recognition
Zhichen Hu1, Huali Ren2, Jielin Jiang1, Yan Cui4, Xiumian Hu3, Xiaolong Xu1,*
1
School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China
2
Institution of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, 515021, China
3
School of Earth Sciences and Engineering, Nanjing University, Nanjing, 210023, China
4
College of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing, 210023, China
* Corresponding Author: Xiaolong Xu. Email:
Computer Modeling in Engineering & Sciences 2023, 135(1), 91-108. https://doi.org/10.32604/cmes.2022.022268
Received 01 March 2022; Accepted 20 May 2022; Issue published 29 September 2022
Abstract
An obviously challenging problem in named entity recognition is the construction of the kind data set of entities.
Although some research has been conducted on entity database construction, the majority of them are directed at
Wikipedia or the minority at structured entities such as people, locations and organizational nouns in the news.
This paper focuses on the identification of scientific entities in carbonate platforms in English literature, using the
example of carbonate platforms in sedimentology. Firstly, based on the fact that the reasons for writing literature
in key disciplines are likely to be provided by multidisciplinary experts, this paper designs a literature content
extraction method that allows dealing with complex text structures. Secondly, based on the literature extraction
content, we formalize the entity extraction task (lexicon and lexical-based entity extraction) for entity extraction.
Furthermore, for testing the accuracy of entity extraction, three currently popular recognition methods are chosen
to perform entity detection in this paper. Experiments show that the entity data set provided by the lexicon and
lexical-based entity extraction method is of significant assistance for the named entity recognition task. This study
presents a pilot study of entity extraction, which involves the use of a complex structure and specialized literature
on carbonate platforms in English.
Keywords
Cite This Article
Hu, Z., Ren, H., Jiang, J., Cui, Y., Hu, X. et al. (2023). Corpus of Carbonate Platforms with Lexical Annotations for Named Entity Recognition.
CMES-Computer Modeling in Engineering & Sciences, 135(1), 91–108.