Tech Science Press - Publisher of Open Access Journals

Open Access

ARTICLE

RSG-Conformer: ReLU-Based Sparse and Grouped Conformer for Audio-Visual Speech Recognition

Yewei Xiao, Xin Du^*, Wei Zeng

CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.072145 - 12 January 2026

Abstract Audio-visual speech recognition (AVSR), which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions, has attracted significant research interest. However, Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length. In addition, Conformer-based architectures may not provide sufficient flexibility for modeling local dependencies at different granularities. To mitigate these limitations, this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer (RSG-Conformer) architecture. Specifically, we propose a Global-enhanced Sparse… More >

Open Access

ARTICLE

An Improved Forest Fire Detection Model Using Audio Classification and Machine Learning

Kemahyanto Exaudi^1,2, Deris Stiawan^3,*, Bhakti Yudho Suprapto¹, Hanif Fakhrurroja⁴, Mohd. Yazid Idris⁵, Tami A. Alghamdi⁶, Rahmat Budiarto⁶

CMC-Computers, Materials & Continua, Vol.86, No.1, pp. 1-24, 2026, DOI:10.32604/cmc.2025.069377 - 10 November 2025

Abstract Sudden wildfires cause significant global ecological damage. While satellite imagery has advanced early fire detection and mitigation, image-based systems face limitations including high false alarm rates, visual obstructions, and substantial computational demands, especially in complex forest terrains. To address these challenges, this study proposes a novel forest fire detection model utilizing audio classification and machine learning. We developed an audio-based pipeline using real-world environmental sound recordings. Sounds were converted into Mel-spectrograms and classified via a Convolutional Neural Network (CNN), enabling the capture of distinctive fire acoustic signatures (e.g., crackling, roaring) that are minimally impacted by… More >

Open Access

ARTICLE

Robust Audio-Visual Fusion for Emotion Recognition Based on Cross-Modal Learning under Noisy Conditions

A-Seong Moon¹, Seungyeon Jeong¹, Donghee Kim¹, Mohd Asyraf Zulkifley², Bong-Soo Sohn^3,*, Jaesung Lee^1,*

CMC-Computers, Materials & Continua, Vol.85, No.2, pp. 2851-2872, 2025, DOI:10.32604/cmc.2025.067103 - 23 September 2025

Abstract Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems. The current study introduces an audio-visual recognition framework designed to address performance degradation caused by environmental interference, such as background noise, overlapping speech, and visual obstructions. The proposed framework employs a structured fusion approach, combining early-stage feature-level integration with decision-level coordination guided by temporal attention mechanisms. Audio data are transformed into mel-spectrogram representations, and visual data are represented as raw frame sequences. Spatial and temporal features are extracted through convolutional and transformer-based encoders, allowing the framework to capture… More > Graphic Abstract

Robust Audio-Visual Fusion for Emotion Recognition Based on Cross-Modal Learning under Noisy Conditions

Open Access

ARTICLE

A Secure Audio Encryption Method Using Tent-Controlled Permutation and Logistic Map-Based Key Generation

Ibtisam A. Taqi^*, Sarab M. Hameed

CMC-Computers, Materials & Continua, Vol.85, No.1, pp. 1653-1674, 2025, DOI:10.32604/cmc.2025.067524 - 29 August 2025

Abstract The exponential growth of audio data shared over the internet and communication channels has raised significant concerns about the security and privacy of transmitted information. Due to high processing requirements, traditional encryption algorithms demand considerable computational effort for real-time audio encryption. To address these challenges, this paper presents a permutation for secure audio encryption using a combination of Tent and 1D logistic maps. The audio data is first shuffled using Tent map for the random permutation. The high random secret key with a length equal to the size of the audio data is then generated… More >

Open Access

ARTICLE

Cardiovascular Sound Classification Using Neural Architectures and Deep Learning for Advancing Cardiac Wellness

Deepak Mahto¹, Sudhakar Kumar¹, Sunil K. Singh¹, Amit Chhabra¹, Irfan Ahmad Khan², Varsha Arya^3,4, Wadee Alhalabi⁵, Brij B. Gupta^6,7,8,9,*, Bassma Saleh Alsulami¹⁰

CMES-Computer Modeling in Engineering & Sciences, Vol.143, No.3, pp. 3743-3767, 2025, DOI:10.32604/cmes.2025.063427 - 30 June 2025

Abstract Cardiovascular diseases (CVDs) remain one of the foremost causes of death globally; hence, the need for several must-have, advanced automated diagnostic solutions towards early detection and intervention. Traditional auscultation of cardiovascular sounds is heavily reliant on clinical expertise and subject to high variability. To counter this limitation, this study proposes an AI-driven classification system for cardiovascular sounds whereby deep learning techniques are engaged to automate the detection of an abnormal heartbeat. We employ FastAI vision-learner-based convolutional neural networks (CNNs) that include ResNet, DenseNet, VGG, ConvNeXt, SqueezeNet, and AlexNet to classify heart sound recordings. Instead of… More >

Open Access

ARTICLE

Interpolation-Based Reversible Data Hiding in Encrypted Audio with Scalable Embedding Capacity

Yuan-Yu Tsai^1,*, Alfrindo Lin¹, Wen-Ting Jao¹, Yi-Hui Chen^2,*

CMC-Computers, Materials & Continua, Vol.84, No.1, pp. 681-697, 2025, DOI:10.32604/cmc.2025.064370 - 09 June 2025

Abstract With the rapid expansion of multimedia data, protecting digital information has become increasingly critical. Reversible data hiding offers an effective solution by allowing sensitive information to be embedded in multimedia files while enabling full recovery of the original data after extraction. Audio, as a vital medium in communication, entertainment, and information sharing, demands the same level of security as images. However, embedding data in encrypted audio poses unique challenges due to the trade-offs between security, data integrity, and embedding capacity. This paper presents a novel interpolation-based reversible data hiding algorithm for encrypted audio that achieves… More >

Open Access

ARTICLE

End-to-End Audio Pattern Recognition Network for Overcoming Feature Limitations in Human-Machine Interaction

Zijian Sun^1,2, Yaqian Li^3,4,*, Haoran Liu^1,2, Haibin Li^3,4, Wenming Zhang^3,4

CMC-Computers, Materials & Continua, Vol.83, No.2, pp. 3187-3210, 2025, DOI:10.32604/cmc.2025.061920 - 16 April 2025

Abstract In recent years, audio pattern recognition has emerged as a key area of research, driven by its applications in human-computer interaction, robotics, and healthcare. Traditional methods, which rely heavily on handcrafted features such as Mel filters, often suffer from information loss and limited feature representation capabilities. To address these limitations, this study proposes an innovative end-to-end audio pattern recognition framework that directly processes raw audio signals, preserving original information and extracting effective classification features. The proposed framework utilizes a dual-branch architecture: a global refinement module that retains channel and temporal details and a multi-scale embedding… More >

Open Access

ARTICLE

Lip-Audio Modality Fusion for Deep Forgery Video Detection

Yong Liu^1,4, Zhiyu Wang^2,*, Shouling Ji³, Daofu Gong^1,5, Lanxin Cheng¹, Ruosi Cheng¹

CMC-Computers, Materials & Continua, Vol.82, No.2, pp. 3499-3515, 2025, DOI:10.32604/cmc.2024.057859 - 17 February 2025

Abstract In response to the problem of traditional methods ignoring audio modality tampering, this study aims to explore an effective deep forgery video detection technique that improves detection precision and reliability by fusing lip images and audio signals. The main method used is lip-audio matching detection technology based on the Siamese neural network, combined with MFCC (Mel Frequency Cepstrum Coefficient) feature extraction of band-pass filters, an improved dual-branch Siamese network structure, and a two-stream network structure design. Firstly, the video stream is preprocessed to extract lip images, and the audio stream is preprocessed to extract MFCC… More >

Open Access

ARTICLE

MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech

Farah Mohammad^1,2,*, Khulood Mohammed Al Mansoor³

CMC-Computers, Materials & Continua, Vol.81, No.3, pp. 4125-4147, 2024, DOI:10.32604/cmc.2024.056666 - 19 December 2024

Abstract Depression is a prevalent mental health issue affecting individuals of all age groups globally. Similar to other mental health disorders, diagnosing depression presents significant challenges for medical practitioners and clinical experts, primarily due to societal stigma and a lack of awareness and acceptance. Although medical interventions such as therapies, medications, and brain stimulation therapy provide hope for treatment, there is still a gap in the efficient detection of depression. Traditional methods, like in-person therapies, are both time-consuming and labor-intensive, emphasizing the necessity for technological assistance, especially through Artificial Intelligence. Alternative to this, in most cases… More >

Open Access

ARTICLE

A Recurrent Neural Network for Multimodal Anomaly Detection by Using Spatio-Temporal Audio-Visual Data

Sameema Tariq¹, Ata-Ur- Rehman^2,3, Maria Abubakar², Waseem Iqbal⁴, Hatoon S. Alsagri⁵, Yousef A. Alduraywish⁵, Haya Abdullah A. Alhakbani^5,*

CMC-Computers, Materials & Continua, Vol.81, No.2, pp. 2493-2515, 2024, DOI:10.32604/cmc.2024.055787 - 18 November 2024

Abstract In video surveillance, anomaly detection requires training machine learning models on spatio-temporal video sequences. However, sometimes the video-only data is not sufficient to accurately detect all the abnormal activities. Therefore, we propose a novel audio-visual spatiotemporal autoencoder specifically designed to detect anomalies for video surveillance by utilizing audio data along with video data. This paper presents a competitive approach to a multi-modal recurrent neural network for anomaly detection that combines separate spatial and temporal autoencoders to leverage both spatial and temporal features in audio-visual data. The proposed model is trained to produce low reconstruction error… More >

Displaying 1-10 on page 1 of 35. Per Page

View

963

Download

445

View

1262

Download

471

View

1357

Download

741

View

2667

Download

2124

View

1122

Download

427

View

812

Download

371

View

1090

Download

648

View

1176

Download

564

View

2488

Download

1828

View

2495

Download

2354

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: