Open Access iconOpen Access

ARTICLE

crossmark

ELM-APDPs: An Explainable Ensemble Learning Method for Accurate Prediction of Druggable Proteins

Mujeebu Rehman1, Qinghua Liu1, Ali Ghulam2, Tariq Ahmad3, Jawad Khan4,*, Dildar Hussain5,*, Yeong Hyeon Gu5

1 School of Information and Communication Engineering, Guilin University of Electronic Technology, Guilin, 541004, China
2 Information Technology Centre, Sindh Agriculture University, Tandojam, 70060, Pakistan
3 School of Electrical and Information Engineering, Hunan University, Changsha, 410082, China
4 School of Computing, Gachon University, Seongnam, 13120, Republic of Korea
5 Department AI and Data Science, Sejong University, Seoul, 05006, Republic of Korea

* Corresponding Authors: Jawad Khan. Email: email; Dildar Hussain. Email: email

(This article belongs to the Special Issue: Recent Developments on Computational Biology-II)

Computer Modeling in Engineering & Sciences 2025, 145(1), 779-805. https://doi.org/10.32604/cmes.2025.067412

Abstract

Identifying druggable proteins, which are capable of binding therapeutic compounds, remains a critical and resource-intensive challenge in drug discovery. To address this, we propose CEL-IDP (Comparison of Ensemble Learning Methods for Identification of Druggable Proteins), a computational framework combining three feature extraction methods Dipeptide Deviation from Expected Mean (DDE), Enhanced Amino Acid Composition (EAAC), and Enhanced Grouped Amino Acid Composition (EGAAC) with ensemble learning strategies (Bagging, Boosting, Stacking) to classify druggable proteins from sequence data. DDE captures dipeptide frequency deviations, EAAC encodes positional amino acid information, and EGAAC groups residues by physicochemical properties to generate discriminative feature vectors. These features were analyzed using ensemble models to overcome the limitations of single classifiers. EGAAC outperformed DDE and EAAC, with Random Forest (Bagging) and XGBoost (Boosting) achieving the highest accuracy of 71.66%, demonstrating superior performance in capturing critical biochemical patterns. Stacking showed intermediate results (68.33%), while EAAC and DDE-based models yielded lower accuracies (56.66%–66.87%). CEL-IDP streamlines large-scale druggability prediction, reduces reliance on costly experimental screening, and aligns with global initiatives like Target 2035 to expand action-able drug targets. This work advances machine learning-driven drug discovery by systematizing feature engineering and ensemble model optimization, providing a scalable workflow to accelerate target identification and validation.

Keywords

Druggable proteins; ensemble learning; computational drug discovery; pharmacological target identification; machine learning; feature extraction

Cite This Article

APA Style
Rehman, M., Liu, Q., Ghulam, A., Ahmad, T., Khan, J. et al. (2025). ELM-APDPs: An Explainable Ensemble Learning Method for Accurate Prediction of Druggable Proteins. Computer Modeling in Engineering & Sciences, 145(1), 779–805. https://doi.org/10.32604/cmes.2025.067412
Vancouver Style
Rehman M, Liu Q, Ghulam A, Ahmad T, Khan J, Hussain D, et al. ELM-APDPs: An Explainable Ensemble Learning Method for Accurate Prediction of Druggable Proteins. Comput Model Eng Sci. 2025;145(1):779–805. https://doi.org/10.32604/cmes.2025.067412
IEEE Style
M. Rehman et al., “ELM-APDPs: An Explainable Ensemble Learning Method for Accurate Prediction of Druggable Proteins,” Comput. Model. Eng. Sci., vol. 145, no. 1, pp. 779–805, 2025. https://doi.org/10.32604/cmes.2025.067412



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 352

    View

  • 163

    Download

  • 0

    Like

Share Link