Home / Journals / CMC / Online First / doi:10.32604/cmc.2026.076402
Special Issues
Table of Content

Open Access

ARTICLE

NeuroTriad-ViT: A Scalable and Interpretable Framework for Multi-Class Brain Tumor Classification via MRI and Knowledge Distillation

Sultan Kahla1, Zuping Zhang1,*, Majed Alsafyani2, Ahmed Emara3,*, Mohammod Abdullah Bin Hossain4, Abdulwahab Osman Sheikhdon1
1 School of Computer Science and Engineering, Central South University, Changsha, China
2 Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
3 Electrical Engineering Department, University of Business and Technology, Jeddah, Saudi Arabia
4 Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh
* Corresponding Author: Zuping Zhang. Email: email; Ahmed Emara. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.076402

Received 20 November 2025; Accepted 28 January 2026; Published online 17 March 2026

Abstract

The effective diagnosis and treatment planning require the correct classification of the cerebral neoplasia, such as glioma, meningioma, and pituitary tumors. The recent developments in the deep learning field have made a significant contribution to the field of image analysis in medicine; however, Vision Transformers (ViTs) have achieved good results but are computationally complex. This paper presents NeuroTriad-ViT, a proprietary large-scale Vision Transformer of 235 million parameters, which is represented as a high-performance teacher model to classify brain tumors. Knowledge distillation is applied in an attempt to transfer the representations that the teacher learned to lightweight student models, in which MobileNetV2 outperformed EfficientNet0-Lite. The models were conditioned on 24,455 MRI scans, which were combined from three publicly available datasets. The CNN-pretrained ViT-based hybrid architecture was the highest-accuracy heterogeneous hybrid model reported as teacher-nominated, with 96% accuracy, and the CNN -architecture-based ensemble models (using EfficientNet, VGG16, and DenseNet) had a maximum accuracy of 92%–95%. Comparatively, the NeuroTriad-ViT model had an accuracy of 98% and the distilled MobileNetV2 model had an accuracy of 99.32%, thus it can be seen as having better performance with lower computational cost. The interpretability of models was tested based on Grad-CAM and LIME, and the measures of insertion and deletion support the faithfulness of the explanations. Overall, the proposed framework enables efficient, interpretable, and scalable brain tumor diagnosis suitable for real-time clinical and mobile health deployment. The source code is publicly available at https://doi.org/10.5281/zenodo.17494928.

Keywords

Vision transformer; CNN feature fusion; knowledge distillation; medical image classification; ensemble learning; explainable AI (XAI)
  • 67

    View

  • 6

    Download

  • 1

    Like

Share Link