Home / Journals / CMC / Online First / doi:10.32604/cmc.2026.080178
Special Issues
Table of Content

Open Access

ARTICLE

Research on Agricultural Machinery Fault Nested Entity Extraction for Low-Resource and High-Noise Scenes

Huaixuan Yan, Yan Gong*
School of Light Industry, Harbin University of Commerce, Harbin, China
* Corresponding Author: Yan Gong. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.080178

Received 04 February 2026; Accepted 13 May 2026; Published online 27 May 2026

Abstract

To correctly diagnose faults in farm machinery, we need to know a lot about the field and have experience with maintenance. However, most of this important information is stored in old, unstructured documents like technical manuals and expert logs. These documents don’t have a standard way to be represented digitally, which makes it very hard to build automated diagnosis systems. There are three main technical problems with getting structured knowledge out of this kind of text: noise from optical character recognition (OCR) during digitization, the extreme lack of labeled samples in specialized fields (low-resource constraints), and the complex nested structures that are common in descriptions of mechanical components. To fill this gap in research, this paper suggests a semantic-enhanced nested entity extraction framework that is made for situations with few resources and a lot of noise. To fill this gap in research, this paper suggests a semantic-enhanced nested entity extraction framework engineered specifically for low-resource and high-noise constraints. First, to mitigate the severe visual noise inherent in digitized legacy documents, we introduce a Targeted Noise-Injection Denoising Paradigm. This module utilizes whole-word masking to simulate and correct OCR character confusion prior to feature extraction. Second, to overcome extreme data sparsity, we propose a Dynamic Domain-Constrained Augmentation Algorithm. Governed by a TF-IDF-weighted substitution formula, this algorithm mathematically isolates and preserves high-information domain entities while expanding the syntactic feature space. Finally, we architect a Hierarchical Span-Decoding Network. By integrating contextual word embeddings with bidirectional temporal gating and a global pointer matrix, this network transcends the “flat” assumptions of traditional sequence labeling to accurately identify multi-level nested entities, such as parts-assembly relationships. Experimental results demonstrate that the proposed framework achieves an F1-score of 95.87% with minimal seed data. Ablation studies also show that the data augmentation strategy leads to big performance gains. Moreover, by employing this method, we create a fault knowledge graph comprising 19,710 entities and validate the efficacy of converting unstructured text into computable fault knowledge via a Retrieval-Augmented Generation (RAG) system.

Keywords

Agricultural machinery; knowledge graph; fault diagnosis; named entity recognition; data augmentation; nested entity extraction; retrieval-augmented generation
  • 177

    View

  • 33

    Download

  • 1

    Like

Share Link