Fang Yuan, Xueyao Gao*, Chunxiang Zhang
CMC-Computers, Materials & Continua, Vol.85, No.3, pp. 5037-5055, 2025, DOI:10.32604/cmc.2025.067760
- 23 October 2025
Abstract 3D model classification has emerged as a significant research focus in computer vision. However, traditional convolutional neural networks (CNNs) often struggle to capture global dependencies across both height and width dimensions simultaneously, leading to limited feature representation capabilities when handling complex visual tasks. To address this challenge, we propose a novel 3D model classification network named ViT-GE (Vision Transformer with Global and Efficient Attention), which integrates Global Grouped Coordinate Attention (GGCA) and Efficient Channel Attention (ECA) mechanisms. Specifically, the Vision Transformer (ViT) is employed to extract comprehensive global features from multi-view inputs using its self-attention More >