Open Access
ARTICLE
ELM-APDPs: An Explainable Ensemble Learning Method for Accurate Prediction of Druggable Proteins
1 School of Information and Communication Engineering, Guilin University of Electronic Technology, Guilin, 541004, China
2 Information Technology Centre, Sindh Agriculture University, Tandojam, 70060, Pakistan
3 School of Electrical and Information Engineering, Hunan University, Changsha, 410082, China
4 School of Computing, Gachon University, Seongnam, 13120, Republic of Korea
5 Department AI and Data Science, Sejong University, Seoul, 05006, Republic of Korea
* Corresponding Authors: Jawad Khan. Email: ; Dildar Hussain. Email:
(This article belongs to the Special Issue: Recent Developments on Computational Biology-II)
Computer Modeling in Engineering & Sciences 2025, 145(1), 779-805. https://doi.org/10.32604/cmes.2025.067412
Received 02 May 2025; Accepted 09 September 2025; Issue published 30 October 2025
Abstract
Identifying druggable proteins, which are capable of binding therapeutic compounds, remains a critical and resource-intensive challenge in drug discovery. To address this, we propose CEL-IDP (Comparison of Ensemble Learning Methods for Identification of Druggable Proteins), a computational framework combining three feature extraction methods Dipeptide Deviation from Expected Mean (DDE), Enhanced Amino Acid Composition (EAAC), and Enhanced Grouped Amino Acid Composition (EGAAC) with ensemble learning strategies (Bagging, Boosting, Stacking) to classify druggable proteins from sequence data. DDE captures dipeptide frequency deviations, EAAC encodes positional amino acid information, and EGAAC groups residues by physicochemical properties to generate discriminative feature vectors. These features were analyzed using ensemble models to overcome the limitations of single classifiers. EGAAC outperformed DDE and EAAC, with Random Forest (Bagging) and XGBoost (Boosting) achieving the highest accuracy of 71.66%, demonstrating superior performance in capturing critical biochemical patterns. Stacking showed intermediate results (68.33%), while EAAC and DDE-based models yielded lower accuracies (56.66%–66.87%). CEL-IDP streamlines large-scale druggability prediction, reduces reliance on costly experimental screening, and aligns with global initiatives like Target 2035 to expand action-able drug targets. This work advances machine learning-driven drug discovery by systematizing feature engineering and ensemble model optimization, providing a scalable workflow to accelerate target identification and validation.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools