Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.071880
Special Issues
Table of Content

Open Access

ARTICLE

KPA-ViT: Key Part-Level Attention Vision Transformer for Foreign Body Classification on Coal Conveyor Belt

Haoxuanye Ji*, Zhiliang Chen, Pengfei Jiang, Ziyue Wang, Ting Yu, Wei Zhang
CCTEG Coal Mining Research Institute, Beijing, 100013, China
* Corresponding Author: Haoxuanye Ji. Email: email
(This article belongs to the Special Issue: Advances in Efficient Vision Transformers: Architectures, Optimization, and Applications)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.071880

Received 14 August 2025; Accepted 01 October 2025; Published online 30 October 2025

Abstract

Foreign body classification on coal conveyor belts is a critical component of intelligent coal mining systems. Previous approaches have primarily utilized convolutional neural networks (CNNs) to effectively integrate spatial and semantic information. However, the performance of CNN-based methods remains limited in classification accuracy, primarily due to insufficient exploration of local image characteristics. Unlike CNNs, Vision Transformer (ViT) captures discriminative features by modeling relationships between local image patches. However, such methods typically require a large number of training samples to perform effectively. In the context of foreign body classification on coal conveyor belts, the limited availability of training samples hinders the full exploitation of Vision Transformer’s (ViT) capabilities. To address this issue, we propose an efficient approach, termed Key Part-level Attention Vision Transformer (KPA-ViT), which incorporates key local information into the transformer architecture to enrich the training information. It comprises three main components: a key-point detection module, a key local mining module, and an attention module. To extract key local regions, a key-point detection strategy is first employed to identify the positions of key points. Subsequently, the key local mining module extracts the relevant local features based on these detected points. Finally, an attention module composed of self-attention and cross-attention blocks is introduced to integrate global and key part-level information, thereby enhancing the model’s ability to learn discriminative features. Compared to recent transformer-based frameworks—such as ViT, Swin-Transformer, and EfficientViT—the proposed KPA-ViT achieves performance improvements of 9.3%, 6.6%, and 2.8%, respectively, on the CUMT-BelT dataset, demonstrating its effectiveness.

Keywords

Foreign body classification; global and part-level key information; coal conveyor belt; vision transformer (ViT); self and cross attention
  • 225

    View

  • 35

    Download

  • 0

    Like

Share Link