Interpretable Deep Representation Learning for Pan-Cancer Diagnosis via Pathway-Constrained Transcriptomics
Maram Fahaad Almufareh, Samabia Tehsin
CMES-Computer Modeling in Engineering & Sciences, Vol.147, No.3, 2026, DOI:10.32604/cmes.2026.081129
(This article belongs to the Special Issue:
Mathematical Aspects of Computational Biology and Bioinformatics-III)
Abstract This article presents a Hierarchical Pathway-Masked Attention Autoencoder (H-PAAE), a biologically inspired representation-learning framework that enables explainable AI-guided cancer diagnosis. The model directly integrates the curated MSigDB Hallmark pathways, introducing pathway-constrained information flow and mechanistic interpretability through multi-level attention mechanisms. Based on TCGA RNA-seq data from 33 tumor types, H-PAAE compresses approximately 20,000 genes into a 128-dimensional latent space while preserving biologically meaningful structure. When used with XGBoost classification, H-PAAE delivers 92.37% test accuracy and 99.38% macro-AUROC with robust cross-validation results (92.5
± 0.6%). SHAP analysis identifies a small number of key latent features, corresponding
More >