Vol.73, No.2, 2022, pp.3951-3967, doi:10.32604/cmc.2022.030934
OPEN ACCESS
ARTICLE
Ensemble Machine Learning to Enhance Q8 Protein Secondary Structure Prediction
  • Moheb R. Girgis, Rofida M. Gamal, Enas Elgeldawi*
Computer Science Department, Faculty of Science, Minia University, 61519, Minia, Egypt
* Corresponding Author: Enas Elgeldawi. Email:
Received 06 April 2022; Accepted 11 May 2022; Issue published 16 June 2022
Abstract
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine, biotechnology and more. Protein secondary structure prediction (PSSP) has a significant role in the prediction of protein tertiary structure, as it bridges the gap between the protein primary sequences and tertiary structure prediction. Protein secondary structures are classified into two categories: 3-state category and 8-state category. Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems, respectively. The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures, however, Q8 prediction has been found to be very challenging, that is why all previous work done in PSSP have focused on Q3 prediction. In this paper, we develop an ensemble Machine Learning (ML) approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP. The ensemble members considered for constructing the ensemble models are well known classifiers, namely SVM (Support Vector Machines), KNN (K-Nearest Neighbor), DT (Decision Tree), RF (Random Forest), and NB (Naïve Bayes), with two feature extraction techniques, namely LDA (Linear Discriminate Analysis) and PCA (Principal Component Analysis). Experiments have been conducted for evaluating the performance of single models and ensemble models, with PCA and LDA, in Q8 PSSP. The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem. The experimental results confirmed that ensemble ML models are more accurate than individual ML models. They also indicated that features extracted by LDA are more effective than those extracted by PCA.
Keywords
Protein secondary structure prediction (PSSP); Q3 prediction; Q8 prediction; ensemble machine leaning; boosting; bagging
Cite This Article
M. R. Girgis, R. M. Gamal and E. Elgeldawi, "Ensemble machine learning to enhance q8 protein secondary structure prediction," Computers, Materials & Continua, vol. 73, no.2, pp. 3951–3967, 2022.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.