Open Access
ARTICLE
Research on Agricultural Machinery Fault Nested Entity Extraction for Low-Resource and High-Noise Scenes
School of Light Industry, Harbin University of Commerce, Harbin, China
* Corresponding Author: Yan Gong. Email:
Computers, Materials & Continua 2026, 88(2), 74 https://doi.org/10.32604/cmc.2026.080178
Received 04 February 2026; Accepted 13 May 2026; Issue published 15 June 2026
Abstract
To correctly diagnose faults in farm machinery, we need to know a lot about the field and have experience with maintenance. However, most of this important information is stored in old, unstructured documents like technical manuals and expert logs. These documents don’t have a standard way to be represented digitally, which makes it very hard to build automated diagnosis systems. There are three main technical problems with getting structured knowledge out of this kind of text: noise from optical character recognition (OCR) during digitization, the extreme lack of labeled samples in specialized fields (low-resource constraints), and the complex nested structures that are common in descriptions of mechanical components. To fill this gap in research, this paper suggests a semantic-enhanced nested entity extraction framework that is made for situations with few resources and a lot of noise. To fill this gap in research, this paper suggests a semantic-enhanced nested entity extraction framework engineered specifically for low-resource and high-noise constraints. First, to mitigate the severe visual noise inherent in digitized legacy documents, we introduce a Targeted Noise-Injection Denoising Paradigm. This module utilizes whole-word masking to simulate and correct OCR character confusion prior to feature extraction. Second, to overcome extreme data sparsity, we propose a Dynamic Domain-Constrained Augmentation Algorithm. Governed by a TF-IDF-weighted substitution formula, this algorithm mathematically isolates and preserves high-information domain entities while expanding the syntactic feature space. Finally, we architect a Hierarchical Span-Decoding Network. By integrating contextual word embeddings with bidirectional temporal gating and a global pointer matrix, this network transcends the “flat” assumptions of traditional sequence labeling to accurately identify multi-level nested entities, such as parts-assembly relationships. Experimental results demonstrate that the proposed framework achieves an F1-score of 95.87% with minimal seed data. Ablation studies also show that the data augmentation strategy leads to big performance gains. Moreover, by employing this method, we create a fault knowledge graph comprising 19,710 entities and validate the efficacy of converting unstructured text into computable fault knowledge via a Retrieval-Augmented Generation (RAG) system.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools