Tech Science Press - Publisher of Open Access Journals

Open Access

ARTICLE

Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications

Shuting Ge^1,2, Jin Ren^2,3,*, Yihua Shi⁴, Yujun Zhang¹, Shunzhi Yang², Jinfeng Yang²

CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 3215-3245, 2024, DOI:10.32604/cmc.2023.046746 - 26 March 2024

Abstract In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues,… More >

Open Access

ARTICLE

Joint On-Demand Pruning and Online Distillation in Automatic Speech Recognition Language Model Optimization

Soonshin Seo^1,2, Ji-Hwan Kim^2,*

CMC-Computers, Materials & Continua, Vol.77, No.3, pp. 2833-2856, 2023, DOI:10.32604/cmc.2023.042816 - 26 December 2023

Abstract Automatic speech recognition (ASR) systems have emerged as indispensable tools across a wide spectrum of applications, ranging from transcription services to voice-activated assistants. To enhance the performance of these systems, it is important to deploy efficient models capable of adapting to diverse deployment conditions. In recent years, on-demand pruning methods have obtained significant attention within the ASR domain due to their adaptability in various deployment scenarios. However, these methods often confront substantial trade-offs, particularly in terms of unstable accuracy when reducing the model size. To address challenges, this study introduces two crucial empirical findings. Firstly,… More >

Open Access

ARTICLE

A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control

Peiyuan Jiang¹, Weijun Pan^1,*, Jian Zhang¹, Teng Wang¹, Junxiang Huang²

CMC-Computers, Materials & Continua, Vol.77, No.1, pp. 911-940, 2023, DOI:10.32604/cmc.2023.041772 - 31 October 2023

Abstract

This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition (ASR) technology in the Air Traffic Control (ATC) field. This paper presents a novel cascaded model architecture, namely Conformer-CTC/Attention-T5 (CCAT), to build a highly accurate and robust ATC speech recognition model. To tackle the challenges posed by noise and fast speech rate in ATC, the Conformer model is employed to extract robust and discriminative speech representations from raw waveforms. On the decoding side, the Attention mechanism is integrated to facilitate precise alignment between input features and

… More >

Open Access

ARTICLE

Speech Recognition via CTC-CNN Model

Wen-Tsai Sung¹, Hao-Wei Kang¹, Sung-Jung Hsiao^2,*

CMC-Computers, Materials & Continua, Vol.76, No.3, pp. 3833-3858, 2023, DOI:10.32604/cmc.2023.040024 - 08 October 2023

Abstract In the speech recognition system, the acoustic model is an important underlying model, and its accuracy directly affects the performance of the entire system. This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification (CTC) algorithm, which plays an important role in the end-to-end framework, established a convolutional neural network (CNN) combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition. This study uses a sound sensor, ReSpeaker Mic Array v2.0.1, to convert the collected speech signals into text… More >

Open Access

ARTICLE

Visual Lip-Reading for Quranic Arabic Alphabets and Words Using Deep Learning

Nada Faisal Aljohani^*, Emad Sami Jaha

Computer Systems Science and Engineering, Vol.46, No.3, pp. 3037-3058, 2023, DOI:10.32604/csse.2023.037113 - 03 April 2023

Abstract The continuing advances in deep learning have paved the way for several challenging ideas. One such idea is visual lip-reading, which has recently drawn many research interests. Lip-reading, often referred to as visual speech recognition, is the ability to understand and predict spoken speech based solely on lip movements without using sounds. Due to the lack of research studies on visual speech recognition for the Arabic language in general, and its absence in the Quranic research, this research aims to fill this gap. This paper introduces a new publicly available Arabic lip-reading dataset containing 10490… More >

Open Access

ARTICLE

Improving Speech Enhancement Framework via Deep Learning

Sung-Jung Hsiao¹, Wen-Tsai Sung^2,*

CMC-Computers, Materials & Continua, Vol.75, No.2, pp. 3817-3832, 2023, DOI:10.32604/cmc.2023.037380 - 31 March 2023

Abstract Speech plays an extremely important role in social activities. Many individuals suffer from a “speech barrier,” which limits their communication with others. In this study, an improved speech recognition method is proposed that addresses the needs of speech-impaired and deaf individuals. A basic improved connectionist temporal classification convolutional neural network (CTC-CNN) architecture acoustic model was constructed by combining a speech database with a deep neural network. Acoustic sensors were used to convert the collected voice signals into text or corresponding voice signals to improve communication. The method can be extended to modern artificial intelligence techniques, More >

Open Access

ARTICLE

An Optimal Method for Speech Recognition Based on Neural Network

Mohamad Khairi Ishak¹, Dag Øivind Madsen^2,*, Fahad Ahmed Al-Zahrani³

Intelligent Automation & Soft Computing, Vol.36, No.2, pp. 1951-1961, 2023, DOI:10.32604/iasc.2023.033971 - 05 January 2023

Abstract Natural language processing technologies have become more widely available in recent years, making them more useful in everyday situations. Machine learning systems that employ accessible datasets and corporate work to serve the whole spectrum of problems addressed in computational linguistics have lately yielded a number of promising breakthroughs. These methods were particularly advantageous for regional languages, as they were provided with cutting-edge language processing tools as soon as the requisite corporate information was generated. The bulk of modern people are unconcerned about the importance of reading. Reading aloud, on the other hand, is an effective… More >

Open Access

ARTICLE

An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

Mohammed Hadwan^1,2,*, Hamzah A. Alsayadi^3,4, Salah AL-Hagree⁵

CMC-Computers, Materials & Continua, Vol.74, No.2, pp. 3471-3487, 2023, DOI:10.32604/cmc.2023.033457 - 31 October 2022

Abstract The attention-based encoder-decoder technique, known as the trans-former, is used to enhance the performance of end-to-end automatic speech recognition (ASR). This research focuses on applying ASR end-to-end transformer-based models for the Arabic language, as the researchers’ community pays little attention to it. The Muslims Holy Qur’an book is written using Arabic diacritized text. In this paper, an end-to-end transformer model to building a robust Qur’an vs. recognition is proposed. The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework. A multi-head attention mechanism is utilized to represent the encoder and… More >

Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems

Sneha Basak¹, Himanshi Agrawal¹, Shreya Jena¹, Shilpa Gite^2,*, Mrinal Bachute², Biswajeet Pradhan^3,4,5,*, Mazen Assiri⁴

CMES-Computer Modeling in Engineering & Sciences, Vol.135, No.2, pp. 1053-1089, 2023, DOI:10.32604/cmes.2022.021755 - 27 October 2022

Abstract Speech recognition systems have become a unique human-computer interaction (HCI) family. Speech is one of the most naturally developed human abilities; speech signal processing opens up a transparent and hand-free computation experience. This paper aims to present a retrospective yet modern approach to the world of speech recognition systems. The development journey of ASR (Automatic Speech Recognition) has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper. A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented, along with a brief discussion of various More >

Open Access

ARTICLE

Research on Tibetan Speech Recognition Based on the Am-do Dialect

Kuntharrgyal Khysru^1,*, Jianguo Wei^1,2, Jianwu Dang³

CMC-Computers, Materials & Continua, Vol.73, No.3, pp. 4897-4907, 2022, DOI:10.32604/cmc.2022.027591 - 28 July 2022

Abstract In China, Tibetan is usually divided into three major dialects: the Am-do, Khams and Lhasa dialects. The Am-do dialect evolved from ancient Tibetan and is a local variant of modern Tibetan. Although this dialect has its own specific historical and social conditions and development, there have been different degrees of communication with other ethnic groups, but all the abovementioned dialects developed from the same language: Tibetan. This paper uses the particularity of Tibetan suffixes in pronunciation and proposes a lexicon for the Am-do language, which optimizes the problems existing in previous research. Audio data of… More >

Displaying 1-10 on page 1 of 29. Per Page

View

966

Download

363

Like

1

View

916

Download

399

View

671

Download

444

Like

1

View

750

Download

500

View

2236

Download

1007

View

923

Download

649

View

1024

Download

639

View

1551

Download

701

View

4911

Download

1633

View

1342

Download

774

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: