Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (25)
  • Open Access


    Visual Lip-Reading for Quranic Arabic Alphabets and Words Using Deep Learning

    Nada Faisal Aljohani*, Emad Sami Jaha

    Computer Systems Science and Engineering, Vol.46, No.3, pp. 3037-3058, 2023, DOI:10.32604/csse.2023.037113

    Abstract The continuing advances in deep learning have paved the way for several challenging ideas. One such idea is visual lip-reading, which has recently drawn many research interests. Lip-reading, often referred to as visual speech recognition, is the ability to understand and predict spoken speech based solely on lip movements without using sounds. Due to the lack of research studies on visual speech recognition for the Arabic language in general, and its absence in the Quranic research, this research aims to fill this gap. This paper introduces a new publicly available Arabic lip-reading dataset containing 10490 videos captured from multiple viewpoints… More >

  • Open Access


    Improving Speech Enhancement Framework via Deep Learning

    Sung-Jung Hsiao1, Wen-Tsai Sung2,*

    CMC-Computers, Materials & Continua, Vol.75, No.2, pp. 3817-3832, 2023, DOI:10.32604/cmc.2023.037380

    Abstract Speech plays an extremely important role in social activities. Many individuals suffer from a “speech barrier,” which limits their communication with others. In this study, an improved speech recognition method is proposed that addresses the needs of speech-impaired and deaf individuals. A basic improved connectionist temporal classification convolutional neural network (CTC-CNN) architecture acoustic model was constructed by combining a speech database with a deep neural network. Acoustic sensors were used to convert the collected voice signals into text or corresponding voice signals to improve communication. The method can be extended to modern artificial intelligence techniques, with multiple applications such as… More >

  • Open Access


    An Optimal Method for Speech Recognition Based on Neural Network

    Mohamad Khairi Ishak1, Dag Øivind Madsen2,*, Fahad Ahmed Al-Zahrani3

    Intelligent Automation & Soft Computing, Vol.36, No.2, pp. 1951-1961, 2023, DOI:10.32604/iasc.2023.033971

    Abstract Natural language processing technologies have become more widely available in recent years, making them more useful in everyday situations. Machine learning systems that employ accessible datasets and corporate work to serve the whole spectrum of problems addressed in computational linguistics have lately yielded a number of promising breakthroughs. These methods were particularly advantageous for regional languages, as they were provided with cutting-edge language processing tools as soon as the requisite corporate information was generated. The bulk of modern people are unconcerned about the importance of reading. Reading aloud, on the other hand, is an effective technique for nourishing feelings as… More >

  • Open Access


    An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

    Mohammed Hadwan1,2,*, Hamzah A. Alsayadi3,4, Salah AL-Hagree5

    CMC-Computers, Materials & Continua, Vol.74, No.2, pp. 3471-3487, 2023, DOI:10.32604/cmc.2023.033457

    Abstract The attention-based encoder-decoder technique, known as the trans-former, is used to enhance the performance of end-to-end automatic speech recognition (ASR). This research focuses on applying ASR end-to-end transformer-based models for the Arabic language, as the researchers’ community pays little attention to it. The Muslims Holy Qur’an book is written using Arabic diacritized text. In this paper, an end-to-end transformer model to building a robust Qur’an vs. recognition is proposed. The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework. A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic… More >

  • Open Access


    Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems

    Sneha Basak1, Himanshi Agrawal1, Shreya Jena1, Shilpa Gite2,*, Mrinal Bachute2, Biswajeet Pradhan3,4,5,*, Mazen Assiri4

    CMES-Computer Modeling in Engineering & Sciences, Vol.135, No.2, pp. 1053-1089, 2023, DOI:10.32604/cmes.2022.021755

    Abstract Speech recognition systems have become a unique human-computer interaction (HCI) family. Speech is one of the most naturally developed human abilities; speech signal processing opens up a transparent and hand-free computation experience. This paper aims to present a retrospective yet modern approach to the world of speech recognition systems. The development journey of ASR (Automatic Speech Recognition) has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper. A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented, along with a brief discussion of various modern-day developments and applications in… More >

  • Open Access


    Research on Tibetan Speech Recognition Based on the Am-do Dialect

    Kuntharrgyal Khysru1,*, Jianguo Wei1,2, Jianwu Dang3

    CMC-Computers, Materials & Continua, Vol.73, No.3, pp. 4897-4907, 2022, DOI:10.32604/cmc.2022.027591

    Abstract In China, Tibetan is usually divided into three major dialects: the Am-do, Khams and Lhasa dialects. The Am-do dialect evolved from ancient Tibetan and is a local variant of modern Tibetan. Although this dialect has its own specific historical and social conditions and development, there have been different degrees of communication with other ethnic groups, but all the abovementioned dialects developed from the same language: Tibetan. This paper uses the particularity of Tibetan suffixes in pronunciation and proposes a lexicon for the Am-do language, which optimizes the problems existing in previous research. Audio data of the Am-do dialect are expanded… More >

  • Open Access


    Automated Speech Recognition System to Detect Babies’ Feelings through Feature Analysis

    Sana Yasin1, Umar Draz2,3,*, Tariq Ali4, Kashaf Shahid1, Amna Abid1, Rukhsana Bibi1, Muhammad Irfan5, Mohammed A. Huneif6, Sultan A. Almedhesh6, Seham M. Alqahtani6, Alqahtani Abdulwahab6, Mohammed Jamaan Alzahrani6, Dhafer Batti Alshehri6, Alshehri Ali Abdullah7, Saifur Rahman5

    CMC-Computers, Materials & Continua, Vol.73, No.2, pp. 4349-4367, 2022, DOI:10.32604/cmc.2022.028251

    Abstract Diagnosing a baby’s feelings poses a challenge for both doctors and parents because babies cannot explain their feelings through expression or speech. Understanding the emotions of babies and their associated expressions during different sensations such as hunger, pain, etc., is a complicated task. In infancy, all communication and feelings are propagated through cry-speech, which is a natural phenomenon. Several clinical methods can be used to diagnose a baby’s diseases, but nonclinical methods of diagnosing a baby’s feelings are lacking. As such, in this study, we aimed to identify babies’ feelings and emotions through their cry using a nonclinical method. Changes… More >

  • Open Access


    Cross-Language Transfer Learning-based Lhasa-Tibetan Speech Recognition

    Zhijie Wang1, Yue Zhao1,*, Licheng Wu1, Xiaojun Bi1, Zhuoma Dawa2, Qiang Ji3

    CMC-Computers, Materials & Continua, Vol.73, No.1, pp. 629-639, 2022, DOI:10.32604/cmc.2022.027092

    Abstract As one of Chinese minority languages, Tibetan speech recognition technology was not researched upon as extensively as Chinese and English were until recently. This, along with the relatively small Tibetan corpus, has resulted in an unsatisfying performance of Tibetan speech recognition based on an end-to-end model. This paper aims to achieve an accurate Tibetan speech recognition using a small amount of Tibetan training data. We demonstrate effective methods of Tibetan end-to-end speech recognition via cross-language transfer learning from three aspects: modeling unit selection, transfer learning method, and source language selection. Experimental results show that the Chinese-Tibetan multi-language learning method using… More >

  • Open Access


    Speak-Correct: A Computerized Interface for the Analysis of Mispronounced Errors

    Kamal Jambi1,*, Hassanin Al-Barhamtoshy1, Wajdi Al-Jedaibi1, Mohsen Rashwan2, Sherif Abdou3

    Computer Systems Science and Engineering, Vol.43, No.3, pp. 1155-1173, 2022, DOI:10.32604/csse.2022.024967

    Abstract Any natural language may have dozens of accents. Even though the equivalent phonemic formation of the word, if it is properly called in different accents, humans do have audio signals that are distinct from one another. Among the most common issues with speech, the processing is discrepancies in pronunciation, accent, and enunciation. This research study examines the issues of detecting, fixing, and summarising accent defects of average Arabic individuals in English-speaking speech. The article then discusses the key approaches and structure that will be utilized to address both accent flaws and pronunciation issues. The proposed SpeakCorrect computerized interface employs a… More >

  • Open Access


    An Innovative Approach Utilizing Binary-View Transformer for Speech Recognition Task

    Muhammad Babar Kamal1, Arfat Ahmad Khan2, Faizan Ahmed Khan3, Malik Muhammad Ali Shahid4, Chitapong Wechtaisong2,*, Muhammad Daud Kamal5, Muhammad Junaid Ali6, Peerapong Uthansakul2

    CMC-Computers, Materials & Continua, Vol.72, No.3, pp. 5547-5562, 2022, DOI:10.32604/cmc.2022.024590

    Abstract The deep learning advancements have greatly improved the performance of speech recognition systems, and most recent systems are based on the Recurrent Neural Network (RNN). Overall, the RNN works fine with the small sequence data, but suffers from the gradient vanishing problem in case of large sequence. The transformer networks have neutralized this issue and have shown state-of-the-art results on sequential or speech-related data. Generally, in speech recognition, the input audio is converted into an image using Mel-spectrogram to illustrate frequencies and intensities. The image is classified by the machine learning mechanism to generate a classification transcript. However, the audio… More >

Displaying 1-10 on page 1 of 25. Per Page