Special Issues
Table of Content

Advances in Efficient Vision Transformers: Architectures, Optimization, and Applications

Submission Deadline: 31 December 2025 View: 675 Submit to Special Issue

Guest Editors

Assist. Prof. Paolo Russo

Email: paolo.russo@diag.uniroma1.it

Affiliation: Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, 00185, Italy

Homepage: 

Research Interests: Deep Learning, Computer Vision, Monocular Depth Estimation, Signal Processing, Biomedical Classification


Assist. Prof. Fabiana Di Ciaccio

Email: fabiana.diciaccio@unifi.it

Affiliation: Department of Civil and Environmental Engineering, University of Florence, Via di Santa Marta, 3, 50139 Florence, Italy

Homepage:

Research Interests: Environmental Monitoring and machine/deep learning (sea surface temperature prediction, shoreline extraction, optimization of the orientation for maritime automated vehicles, etc); monitoring and preservation of cultural heritage against natural and anthropogenic risks and the effects of climate change, metrology, underwater photogrammetry and 3D reconstruction techniques.


Summary

In recent years, Vision Transformers (ViTs) have emerged as a powerful alternative to convolutional neural networks (CNNs) for computer vision tasks. However, their high computational cost and memory demands pose significant challenges for deployment in real-world applications, particularly in resource-constrained environments. Addressing these challenges, researchers have developed efficient ViT architectures, optimized training techniques, and novel performance benchmarking methodologies.
This special issue aims to gather state-of-the-art research on efficient Vision Transformers, covering algorithmic advancements, optimization techniques, and rigorous benchmarking. We invite contributions that explore the theoretical foundations, methodological innovations, and practical implementations of efficient ViTs. Papers demonstrating the versatility and scalability of these models across various vision tasks are particularly encouraged. By providing a platform for these advancements, this special issue seeks to promote collaboration and guide future research in making Vision Transformers more efficient and accessible.

The proposed special issue welcomes original research articles, surveys, and reviews on efficient Vision Transformers, including (but not limited to) the following topics:
· Development of lightweight and efficient Vision Transformer architectures
· Pruning, quantization, and low-rank approximations for ViTs
· Knowledge distillation techniques for compact Vision Transformers
· Hardware-aware and energy-efficient ViT models
· Efficient training strategies, including self-supervised and transfer learning for ViTs
· Performance benchmarking methodologies for efficient Vision Transformers
· Applications of efficient ViTs in real-world scenarios
· Integration of efficient ViTs with edge computing and IoT platforms
· Robustness, interpretability, and fairness in Vision Transformer models
· Challenges and solutions for large-scale training of Vision Transformers


We encourage submissions illustrating the impact of efficient ViTs across various domains, such as:
- Autonomous systems and robotics
- Medical imaging and health informatics
- Smart cities and intelligent transportation
- Augmented and virtual reality applications
- Industrial automation and manufacturing
- Surveillance and monitoring

By assembling cutting-edge research on efficient Vision Transformers, this special issue aims to serve as a valuable resource for researchers and practitioners, providing insights into current trends and future directions in this rapidly evolving field.


Keywords

Deep Learning, Neural Networks, Vision Transformers, Network Optimization

Published Papers


  • Open Access

    ARTICLE

    KPA-ViT: Key Part-Level Attention Vision Transformer for Foreign Body Classification on Coal Conveyor Belt

    Haoxuanye Ji, Zhiliang Chen, Pengfei Jiang, Ziyue Wang, Ting Yu, Wei Zhang
    CMC-Computers, Materials & Continua, DOI:10.32604/cmc.2025.071880
    (This article belongs to the Special Issue: Advances in Efficient Vision Transformers: Architectures, Optimization, and Applications)
    Abstract Foreign body classification on coal conveyor belts is a critical component of intelligent coal mining systems. Previous approaches have primarily utilized convolutional neural networks (CNNs) to effectively integrate spatial and semantic information. However, the performance of CNN-based methods remains limited in classification accuracy, primarily due to insufficient exploration of local image characteristics. Unlike CNNs, Vision Transformer (ViT) captures discriminative features by modeling relationships between local image patches. However, such methods typically require a large number of training samples to perform effectively. In the context of foreign body classification on coal conveyor belts, the limited availability… More >

  • Open Access

    ARTICLE

    HERL-ViT: A Hybrid Enhanced Vision Transformer Based on Regional-Local Attention for Malware Detection

    Boyan Cui, Huijuan Wang, Yongjun Qi, Hongce Chen, Quanbo Yuan, Dongran Liu, Xuehua Zhou
    CMC-Computers, Materials & Continua, Vol.85, No.3, pp. 5531-5553, 2025, DOI:10.32604/cmc.2025.070101
    (This article belongs to the Special Issue: Advances in Efficient Vision Transformers: Architectures, Optimization, and Applications)
    Abstract The proliferation of malware and the emergence of adversarial samples pose severe threats to global cybersecurity, demanding robust detection mechanisms. Traditional malware detection methods suffer from limited feature extraction capabilities, while existing Vision Transformer (ViT)-based approaches face high computational complexity due to global self-attention, hindering their efficiency in handling large-scale image data. To address these issues, this paper proposes a novel hybrid enhanced Vision Transformer architecture, HERL-ViT, tailored for malware detection. The detection framework involves five phases: malware image visualization, image segmentation with patch embedding, regional-local attention-based feature extraction, enhanced feature transformation, and classification. Methodologically,… More >

Share Link