Open Access iconOpen Access



An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

Mohammed Hadwan1,2,*, Hamzah A. Alsayadi3,4, Salah AL-Hagree5

1 Department of Information Technology, College of Computer, Qassim University, Buraydah, 51452, Saudi Arabia
2 Department of Computer Science, College of Applied Sciences, Taiz University, Taiz, 6803, Yemen
3 Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, 11566, Egypt
4 Computer Science Department, Faculty of Sciences, Ibb University, Yemen
5 Department of Computer Sciences & Information, Ibb University, Yemen

* Corresponding Author: Mohammed Hadwan. Email: email

Computers, Materials & Continua 2023, 74(2), 3471-3487.


The attention-based encoder-decoder technique, known as the trans-former, is used to enhance the performance of end-to-end automatic speech recognition (ASR). This research focuses on applying ASR end-to-end transformer-based models for the Arabic language, as the researchers’ community pays little attention to it. The Muslims Holy Qur’an book is written using Arabic diacritized text. In this paper, an end-to-end transformer model to building a robust Qur’an vs. recognition is proposed. The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework. A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic model. A Mel filter bank is used for feature extraction. To build a language model (LM), the Recurrent Neural Network (RNN) and Long short-term memory (LSTM) were used to train an n-gram word-based LM. As a part of this research, a new dataset of Qur’an verses and their associated transcripts were collected and processed for training and evaluating the proposed model, consisting of 10 h of .wav recitations performed by 60 reciters. The experimental results showed that the proposed end-to-end transformer-based model achieved a significant low character error rate (CER) of 1.98% and a word error rate (WER) of 6.16%. We have achieved state-of-the-art end-to-end transformer-based recognition for Qur’an reciters.


Cite This Article

M. Hadwan, H. A. Alsayadi and S. AL-Hagree, "An end-to-end transformer-based automatic speech recognition for qur’an reciters," Computers, Materials & Continua, vol. 74, no.2, pp. 3471–3487, 2023.

cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 865


  • 394


  • 0


Share Link