Open Access iconOpen Access

ARTICLE

KPA-ViT: Key Part-Level Attention Vision Transformer for Foreign Body Classification on Coal Conveyor Belt

Haoxuanye Ji*, Zhiliang Chen, Pengfei Jiang, Ziyue Wang, Ting Yu, Wei Zhang

CCTEG Coal Mining Research Institute, Beijing, 100013, China

* Corresponding Author: Haoxuanye Ji. Email: email

(This article belongs to the Special Issue: Advances in Efficient Vision Transformers: Architectures, Optimization, and Applications)

Computers, Materials & Continua 2026, 86(3), 24 https://doi.org/10.32604/cmc.2025.071880

Abstract

Foreign body classification on coal conveyor belts is a critical component of intelligent coal mining systems. Previous approaches have primarily utilized convolutional neural networks (CNNs) to effectively integrate spatial and semantic information. However, the performance of CNN-based methods remains limited in classification accuracy, primarily due to insufficient exploration of local image characteristics. Unlike CNNs, Vision Transformer (ViT) captures discriminative features by modeling relationships between local image patches. However, such methods typically require a large number of training samples to perform effectively. In the context of foreign body classification on coal conveyor belts, the limited availability of training samples hinders the full exploitation of Vision Transformer’s (ViT) capabilities. To address this issue, we propose an efficient approach, termed Key Part-level Attention Vision Transformer (KPA-ViT), which incorporates key local information into the transformer architecture to enrich the training information. It comprises three main components: a key-point detection module, a key local mining module, and an attention module. To extract key local regions, a key-point detection strategy is first employed to identify the positions of key points. Subsequently, the key local mining module extracts the relevant local features based on these detected points. Finally, an attention module composed of self-attention and cross-attention blocks is introduced to integrate global and key part-level information, thereby enhancing the model’s ability to learn discriminative features. Compared to recent transformer-based frameworks—such as ViT, Swin-Transformer, and EfficientViT—the proposed KPA-ViT achieves performance improvements of 9.3%, 6.6%, and 2.8%, respectively, on the CUMT-BelT dataset, demonstrating its effectiveness.

Keywords

Foreign body classification; global and part-level key information; coal conveyor belt; vision transformer (ViT); self and cross attention

Cite This Article

APA Style
Ji, H., Chen, Z., Jiang, P., Wang, Z., Yu, T. et al. (2026). KPA-ViT: Key Part-Level Attention Vision Transformer for Foreign Body Classification on Coal Conveyor Belt. Computers, Materials & Continua, 86(3), 24. https://doi.org/10.32604/cmc.2025.071880
Vancouver Style
Ji H, Chen Z, Jiang P, Wang Z, Yu T, Zhang W. KPA-ViT: Key Part-Level Attention Vision Transformer for Foreign Body Classification on Coal Conveyor Belt. Comput Mater Contin. 2026;86(3):24. https://doi.org/10.32604/cmc.2025.071880
IEEE Style
H. Ji, Z. Chen, P. Jiang, Z. Wang, T. Yu, and W. Zhang, “KPA-ViT: Key Part-Level Attention Vision Transformer for Foreign Body Classification on Coal Conveyor Belt,” Comput. Mater. Contin., vol. 86, no. 3, pp. 24, 2026. https://doi.org/10.32604/cmc.2025.071880



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 546

    View

  • 116

    Download

  • 0

    Like

Share Link