A Comparative Study of Data Representation Techniques for Deep Learning-Based Classification of Promoter and Histone-Associated DNA Regions

Sarab Almuhaideb; Najwa Altwaijry; Isra Al-Turaiki; Ahmad Khan; Hamza Rizvi

doi:10.32604/cmc.2025.067390

Open Access icon Open Access

ARTICLE

A Comparative Study of Data Representation Techniques for Deep Learning-Based Classification of Promoter and Histone-Associated DNA Regions

Sarab Almuhaideb^1,*, Najwa Altwaijry¹, Isra Al-Turaiki¹, Ahmad Raza Khan², Hamza Ali Rizvi³

1 Computer Science Department, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh, 11543, Saudi Arabia
2 Chemical Engineering, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
3 Computer Science and Engineering Department, Punjab Engineering College, Sector 12, Chandigarh, 160012, India

* Corresponding Author: Sarab Almuhaideb. Email: email

(This article belongs to the Special Issue: Emerging Machine Learning Methods and Applications)

Computers, Materials & Continua 2025, 85(2), 3095-3128. https://doi.org/10.32604/cmc.2025.067390

Received 01 May 2025; Accepted 09 July 2025; Issue published 23 September 2025

Abstract

Many bioinformatics applications require determining the class of a newly sequenced Deoxyribonucleic acid (DNA) sequence, making DNA sequence classification an integral step in performing bioinformatics analysis, where large biomedical datasets are transformed into valuable knowledge. Existing methods rely on a feature extraction step and suffer from high computational time requirements. In contrast, newer approaches leveraging deep learning have shown significant promise in enhancing accuracy and efficiency. In this paper, we investigate the performance of various deep learning architectures: Convolutional Neural Network (CNN), CNN-Long Short-Term Memory (CNN-LSTM), CNN-Bidirectional Long Short-Term Memory (CNN-BiLSTM), Residual Network (ResNet), and InceptionV3 for DNA sequence classification. Various numerical and visual data representation techniques are utilized to represent the input datasets, including: label encoding, -mer sentence encoding, -mer one-hot vector, Frequency Chaos Game Representation (FCGR) and 5-Color Map (ColorSquare). Three datasets are used for the training of the models including H3, H4 and DNA Sequence Dataset (Yeast, Human, Arabidopsis Thaliana). Experiments are performed to determine which combination of DNA representation and deep learning architecture yields improved performance for the classification task. Our results indicate that using a hybrid CNN-LSTM neural network trained on DNA sequences represented as one-hot encoded -mer sequences yields the best performance, achieving an accuracy of 92.1%.

Keywords

DNA sequence classification; deep learning; data visualization

Cite This Article

APA Style

Almuhaideb, S., Altwaijry, N., Al-Turaiki, I., Khan, A.R., Rizvi, H.A. (2025). A Comparative Study of Data Representation Techniques for Deep Learning-Based Classification of Promoter and Histone-Associated DNA Regions. Computers, Materials & Continua, 85(2), 3095–3128. https://doi.org/10.32604/cmc.2025.067390

Vancouver Style

Almuhaideb S, Altwaijry N, Al-Turaiki I, Khan AR, Rizvi HA. A Comparative Study of Data Representation Techniques for Deep Learning-Based Classification of Promoter and Histone-Associated DNA Regions. Comput Mater Contin. 2025;85(2):3095–3128. https://doi.org/10.32604/cmc.2025.067390

IEEE Style

S. Almuhaideb, N. Altwaijry, I. Al-Turaiki, A. R. Khan, and H. A. Rizvi, “A Comparative Study of Data Representation Techniques for Deep Learning-Based Classification of Promoter and Histone-Associated DNA Regions,” Comput. Mater. Contin., vol. 85, no. 2, pp. 3095–3128, 2025. https://doi.org/10.32604/cmc.2025.067390

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Comparative Study of Data Representation Techniques for Deep Learning-Based Classification of Promoter and Histone-Associated DNA Regions

Abstract

Keywords

Cite This Article

1860

951

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link