Prediction of Intrinsically Disordered Proteins with a Low Computational Complexity Method

Jia Yang; Haiyuan Liu; Hao He

doi:10.32604/cmes.2020.010347

Open Access icon Open Access

ARTICLE

Prediction of Intrinsically Disordered Proteins with a Low Computational Complexity Method

Jia Yang¹, Haiyuan Liu^1,*, Hao He²

1 Nankai University, School of Electronic Information and Optical Engineering, Tianjin Key Laboratory of Optoelectronic Sensor and Sensing Network Technology, Tianjin, 300350, China
2 Hebei University of Technology, Tianjin, 300400, China

* Corresponding Author: Haiyuan Liu. Email: email

Computer Modeling in Engineering & Sciences 2020, 125(1), 111-123. https://doi.org/10.32604/cmes.2020.010347

Received 27 February 2020; Accepted 21 July 2020; Issue published 18 September 2020

Download PDF

Abstract

The prediction of intrinsically disordered proteins is a hot research area in bio-information. Due to the high cost of experimental methods to evaluate disordered regions of protein sequences, it is becoming increasingly important to predict those regions through computational methods. In this paper, we developed a novel scheme by employing sequence complexity to calculate six features for each residue of a protein sequence, which includes the Shannon entropy, the topological entropy, the sample entropy and three amino acid preferences including Remark 465, Deleage/Roux, and Bfactor(2STD). Particularly, we introduced the sample entropy for calculating time series complexity by mapping the amino acid sequence to a time series of 0–9. To our knowledge, the sample entropy has not been previously used for predicting IDPs and hence is being used for the first time in our study. In addition, the scheme used a properly sized sliding window in every protein sequence which greatly improved the prediction performance. Finally, we used seven machine learning algorithms and tested with 10-fold cross-validation to get the results on the dataset R80 collected by Yang et al. and of the dataset DIS1556 from the Database of Protein Disorder (DisProt) (https://www. disprot.org) containing experimentally determined intrinsically disordered proteins (IDPs). The results showed that k-Nearest Neighbor was more appropriate and an overall prediction accuracy of 92%. Furthermore, our method just used six features and hence required lower computational complexity.

Keywords

Bioinformatics; intrinsically disordered proteins; machine learning algorithms; sequences; computational complexity

Cite This Article

APA Style

Yang, J., Liu, H., He, H. (2020). Prediction of Intrinsically Disordered Proteins with a Low Computational Complexity Method. Computer Modeling in Engineering & Sciences, 125(1), 111–123. https://doi.org/10.32604/cmes.2020.010347

Vancouver Style

Yang J, Liu H, He H. Prediction of Intrinsically Disordered Proteins with a Low Computational Complexity Method. Comput Model Eng Sci. 2020;125(1):111–123. https://doi.org/10.32604/cmes.2020.010347

IEEE Style

J. Yang, H. Liu, and H. He, “Prediction of Intrinsically Disordered Proteins with a Low Computational Complexity Method,” Comput. Model. Eng. Sci., vol. 125, no. 1, pp. 111–123, 2020. https://doi.org/10.32604/cmes.2020.010347

BibTex EndNote RIS

Copyright © 2020 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Prediction of Intrinsically Disordered Proteins with a Low Computational Complexity Method

Abstract

Keywords

Cite This Article

3984

4139

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link