Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (6)
  • Open Access

    ARTICLE

    Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications

    Shuting Ge1,2, Jin Ren2,3,*, Yihua Shi4, Yujun Zhang1, Shunzhi Yang2, Jinfeng Yang2

    CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 3215-3245, 2024, DOI:10.32604/cmc.2023.046746

    Abstract In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal… More >

  • Open Access

    ARTICLE

    Joint On-Demand Pruning and Online Distillation in Automatic Speech Recognition Language Model Optimization

    Soonshin Seo1,2, Ji-Hwan Kim2,*

    CMC-Computers, Materials & Continua, Vol.77, No.3, pp. 2833-2856, 2023, DOI:10.32604/cmc.2023.042816

    Abstract Automatic speech recognition (ASR) systems have emerged as indispensable tools across a wide spectrum of applications, ranging from transcription services to voice-activated assistants. To enhance the performance of these systems, it is important to deploy efficient models capable of adapting to diverse deployment conditions. In recent years, on-demand pruning methods have obtained significant attention within the ASR domain due to their adaptability in various deployment scenarios. However, these methods often confront substantial trade-offs, particularly in terms of unstable accuracy when reducing the model size. To address challenges, this study introduces two crucial empirical findings. Firstly, it proposes the incorporation of… More >

  • Open Access

    ARTICLE

    A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control

    Peiyuan Jiang1, Weijun Pan1,*, Jian Zhang1, Teng Wang1, Junxiang Huang2

    CMC-Computers, Materials & Continua, Vol.77, No.1, pp. 911-940, 2023, DOI:10.32604/cmc.2023.041772

    Abstract

    This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition (ASR) technology in the Air Traffic Control (ATC) field. This paper presents a novel cascaded model architecture, namely Conformer-CTC/Attention-T5 (CCAT), to build a highly accurate and robust ATC speech recognition model. To tackle the challenges posed by noise and fast speech rate in ATC, the Conformer model is employed to extract robust and discriminative speech representations from raw waveforms. On the decoding side, the Attention mechanism is integrated to facilitate precise alignment between input features and output characters. The Text-To-Text… More >

  • Open Access

    ARTICLE

    Speech Recognition via CTC-CNN Model

    Wen-Tsai Sung1, Hao-Wei Kang1, Sung-Jung Hsiao2,*

    CMC-Computers, Materials & Continua, Vol.76, No.3, pp. 3833-3858, 2023, DOI:10.32604/cmc.2023.040024

    Abstract In the speech recognition system, the acoustic model is an important underlying model, and its accuracy directly affects the performance of the entire system. This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification (CTC) algorithm, which plays an important role in the end-to-end framework, established a convolutional neural network (CNN) combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition. This study uses a sound sensor, ReSpeaker Mic Array v2.0.1, to convert the collected speech signals into text or corresponding speech signals to… More >

  • Open Access

    ARTICLE

    An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

    Mohammed Hadwan1,2,*, Hamzah A. Alsayadi3,4, Salah AL-Hagree5

    CMC-Computers, Materials & Continua, Vol.74, No.2, pp. 3471-3487, 2023, DOI:10.32604/cmc.2023.033457

    Abstract The attention-based encoder-decoder technique, known as the trans-former, is used to enhance the performance of end-to-end automatic speech recognition (ASR). This research focuses on applying ASR end-to-end transformer-based models for the Arabic language, as the researchers’ community pays little attention to it. The Muslims Holy Qur’an book is written using Arabic diacritized text. In this paper, an end-to-end transformer model to building a robust Qur’an vs. recognition is proposed. The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework. A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic… More >

  • Open Access

    REVIEW

    Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems

    Sneha Basak1, Himanshi Agrawal1, Shreya Jena1, Shilpa Gite2,*, Mrinal Bachute2, Biswajeet Pradhan3,4,5,*, Mazen Assiri4

    CMES-Computer Modeling in Engineering & Sciences, Vol.135, No.2, pp. 1053-1089, 2023, DOI:10.32604/cmes.2022.021755

    Abstract Speech recognition systems have become a unique human-computer interaction (HCI) family. Speech is one of the most naturally developed human abilities; speech signal processing opens up a transparent and hand-free computation experience. This paper aims to present a retrospective yet modern approach to the world of speech recognition systems. The development journey of ASR (Automatic Speech Recognition) has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper. A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented, along with a brief discussion of various modern-day developments and applications in… More >

Displaying 1-10 on page 1 of 6. Per Page