NeuroTriad-ViT: A Scalable and Interpretable Framework for Multi-Class Brain Tumor Classification via MRI and Knowledge Distillation

Sultan Kahla¹, Zuping Zhang^1,*, Majed Alsafyani², Ahmed Emara^3,*, Mohammod Abdullah Bin Hossain⁴, Abdulwahab Osman Sheikhdon¹
1 School of Computer Science and Engineering, Central South University, Changsha, China
2 Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
3 Electrical Engineering Department, University of Business and Technology, Jeddah, Saudi Arabia
4 Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh
* Corresponding Author: Zuping Zhang. Email: email ; Ahmed Emara. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.076402

Received 20 November 2025; Accepted 28 January 2026; Published online 17 March 2026

Download PDF

Abstract

The effective diagnosis and treatment planning require the correct classification of the cerebral neoplasia, such as glioma, meningioma, and pituitary tumors. The recent developments in the deep learning field have made a significant contribution to the field of image analysis in medicine; however, Vision Transformers (ViTs) have achieved good results but are computationally complex. This paper presents NeuroTriad-ViT, a proprietary large-scale Vision Transformer of 235 million parameters, which is represented as a high-performance teacher model to classify brain tumors. Knowledge distillation is applied in an attempt to transfer the representations that the teacher learned to lightweight student models, in which MobileNetV2 outperformed EfficientNet0-Lite. The models were conditioned on 24,455 MRI scans, which were combined from three publicly available datasets. The CNN-pretrained ViT-based hybrid architecture was the highest-accuracy heterogeneous hybrid model reported as teacher-nominated, with 96% accuracy, and the CNN -architecture-based ensemble models (using EfficientNet, VGG16, and DenseNet) had a maximum accuracy of 92%–95%. Comparatively, the NeuroTriad-ViT model had an accuracy of 98% and the distilled MobileNetV2 model had an accuracy of 99.32%, thus it can be seen as having better performance with lower computational cost. The interpretability of models was tested based on Grad-CAM and LIME, and the measures of insertion and deletion support the faithfulness of the explanations. Overall, the proposed framework enables efficient, interpretable, and scalable brain tumor diagnosis suitable for real-time clinical and mobile health deployment. The source code is publicly available at https://doi.org/10.5281/zenodo.17494928.

Keywords

Vision transformer; CNN feature fusion; knowledge distillation; medical image classification; ensemble learning; explainable AI (XAI)

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

67

View
6

Download
1

Like

GRU-based Buzzer Ensemble for Abnormal Detection in Industrial Control Systems
Hyo-Seok Kim, Chang-Gyoon Lim,...
Airstacknet: A Stacking Ensemble-Based Approach for Air Quality Prediction
Amel Ksibi, Amina Salhi, Ala Saleh...
GA-Stacking: A New Stacking-Based Ensemble Learning Method to Forecast the COVID-19 Outbreak
Walaa N. Ismail, Hessah A. Alsalamah,...
An Intelligent Hazardous Waste Detection and Classification Model Using Ensemble Learning Techniques
Mesfer Al Duhayyim, Saud S. Alotaibi,...
Reducing Dataset Specificity for Deepfakes Using Ensemble Learning
Qaiser Abbas, Turki Alghamdi,...

All issues

Online First

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

NeuroTriad-ViT: A Scalable and Interpretable Framework for Multi-Class Brain Tumor Classification via MRI and Knowledge Distillation

Abstract

Keywords

67

6

1

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link