Tech Science Press - Publisher of Open Access Journals

Open Access

ARTICLE

A Synthetic Speech Detection Model Combining Local-Global Dependency

Jiahui Song, Yuepeng Zhang, Wenhao Yuan^*

CMC-Computers, Materials & Continua, Vol.86, No.1, pp. 1-15, 2026, DOI:10.32604/cmc.2025.069918 - 10 November 2025

Abstract Synthetic speech detection is an essential task in the field of voice security, aimed at identifying deceptive voice attacks generated by text-to-speech (TTS) systems or voice conversion (VC) systems. In this paper, we propose a synthetic speech detection model called TFTransformer, which integrates both local and global features to enhance detection capabilities by effectively modeling local and global dependencies. Structurally, the model is divided into two main components: a front-end and a back-end. The front-end of the model uses a combination of SincLayer and two-dimensional (2D) convolution to extract high-level feature maps (HFM) containing local… More >

Open Access

ARTICLE

Mordukhovich Subdifferential Optimization Framework for Multi-Criteria Voice Cloning of Pathological Speech

Rytis Maskeliūnas¹, Robertas Damaševičius^1,*, Audrius Kulikajevas¹, Kipras Pribuišis², Nora Ulozaitė-Stanienė², Virgilijus Uloza²

CMES-Computer Modeling in Engineering & Sciences, Vol.145, No.3, pp. 4203-4223, 2025, DOI:10.32604/cmes.2025.072790 - 23 December 2025

Abstract This study introduces a novel voice cloning framework driven by Mordukhovich Subdifferential Optimization (MSO) to address the complex multi-objective challenges of pathological speech synthesis in under-resourced Lithuanian language with unique phonemes not present in most pre-trained models. Unlike existing voice synthesis models that often optimize for a single objective or are restricted to major languages, our approach explicitly balances four competing criteria: speech naturalness, speaker similarity, computational efficiency, and adaptability to pathological voice patterns. We evaluate four model configurations combining Lithuanian and English encoders, synthesizers, and vocoders. The hybrid model (English encoder, Lithuanian synthesizer, English More >

Open Access

ARTICLE

Using Hate Speech Detection Techniques to Prevent Violence and Foster Community Safety

Ayaz Hussain¹, Asad Hayat², Muhammad Hasnain^1,*

Journal on Artificial Intelligence, Vol.7, pp. 485-498, 2025, DOI:10.32604/jai.2025.071933 - 17 November 2025

Abstract Violent hate speech and scapegoating people against one another have emerged as a rising worldwide issue. But identifying and combating such content is crucial to create safer and more inclusive societies. The current study conducted research using Machine Learning models to classify hate speech and overcome the limitations posed in the existing detection techniques. Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbour (KNN) and Decision Tree were used on top of a publicly available hate speech dataset. The data was preprocessed by cleaning the text and tokenization and using normalization techniques to efficiently train the… More >

Open Access

ARTICLE

A Black-Box Speech Adversarial Attack Method Based on Enhanced Neural Predictors in Industrial IoT

Yun Zhang, Zhenhua Yu^*, Xufei Hu, Xuya Cong, Ou Ye

CMC-Computers, Materials & Continua, Vol.84, No.3, pp. 5403-5426, 2025, DOI:10.32604/cmc.2025.067120 - 30 July 2025

Abstract Devices in Industrial Internet of Things are vulnerable to voice adversarial attacks. Studying adversarial speech samples is crucial for enhancing the security of automatic speech recognition systems in Industrial Internet of Things devices. Current black-box attack methods often face challenges such as complex search processes and excessive perturbation generation. To address these issues, this paper proposes a black-box voice adversarial attack method based on enhanced neural predictors. This method searches for minimal perturbations in the perturbation space, employing an optimization process guided by a self-attention neural predictor to identify the optimal perturbation direction. This direction… More >

Open Access

ARTICLE

Enhancing Phoneme Labeling in Dysarthric Speech with Digital Twin-Driven Multi-Modal Architecture

Saeed Alzahrani¹, Nazar Hussain², Farah Mohammad^3,*

CMC-Computers, Materials & Continua, Vol.84, No.3, pp. 4825-4849, 2025, DOI:10.32604/cmc.2025.066322 - 30 July 2025

Abstract Digital twin technology is revolutionizing personalized healthcare by creating dynamic virtual replicas of individual patients. This paper presents a novel multi-modal architecture leveraging digital twins to enhance precision in predictive diagnostics and treatment planning of phoneme labeling. By integrating real-time images, electronic health records, and genomic information, the system enables personalized simulations for disease progression modeling, treatment response prediction, and preventive care strategies. In dysarthric speech, which is characterized by articulation imprecision, temporal misalignments, and phoneme distortions, existing models struggle to capture these irregularities. Traditional approaches, often relying solely on audio features, fail to address… More >

Open Access

ARTICLE

Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation

Chiara Innocente^1,*, Matteo Boemio², Gianmarco Lorenzetti², Ilaria Pulito², Diego Romagnoli², Valeria Saponaro², Giorgia Marullo¹, Luca Ulrich¹, Enrico Vezzetti¹

CMES-Computer Modeling in Engineering & Sciences, Vol.143, No.2, pp. 1355-1379, 2025, DOI:10.32604/cmes.2025.063186 - 30 May 2025

Abstract Lip-reading technology, based on visual speech decoding and automatic speech recognition, offers a promising solution to overcoming communication barriers, particularly for individuals with temporary or permanent speech impairments. However, most Visual Speech Recognition (VSR) research has primarily focused on the English language and general-purpose applications, limiting its practical applicability in medical and rehabilitative settings. This study introduces the first Deep Learning (DL) based lip-reading system for the Italian language designed to assist individuals with vocal cord pathologies in daily interactions, facilitating communication for patients recovering from vocal cord surgeries, whether temporarily or permanently impaired. To… More >

Open Access

ARTICLE

Integrating Speech-to-Text for Image Generation Using Generative Adversarial Networks

Smita Mahajan¹, Shilpa Gite^1,2, Biswajeet Pradhan^3,*, Abdullah Alamri⁴, Shaunak Inamdar⁵, Deva Shriyansh⁵, Akshat Ashish Shah⁵, Shruti Agarwal⁵

CMES-Computer Modeling in Engineering & Sciences, Vol.143, No.2, pp. 2001-2026, 2025, DOI:10.32604/cmes.2025.058456 - 30 May 2025

Abstract The development of generative architectures has resulted in numerous novel deep-learning models that generate images using text inputs. However, humans naturally use speech for visualization prompts. Therefore, this paper proposes an architecture that integrates speech prompts as input to image-generation Generative Adversarial Networks (GANs) model, leveraging Speech-to-Text translation along with the CLIP + VQGAN model. The proposed method involves translating speech prompts into text, which is then used by the Contrastive Language-Image Pretraining (CLIP) + Vector Quantized Generative Adversarial Network (VQGAN) model to generate images. This paper outlines the steps required to implement such a… More >

Open Access

ARTICLE

ALCTS—An Assistive Learning and Communicative Tool for Speech and Hearing Impaired Students

Shabana Ziyad Puthu Vedu^1,*, Wafaa A. Ghonaim², Naglaa M. Mostafa³, Pradeep Kumar Singh⁴

CMC-Computers, Materials & Continua, Vol.83, No.2, pp. 2599-2617, 2025, DOI:10.32604/cmc.2025.062695 - 16 April 2025

Abstract Hearing and Speech impairment can be congenital or acquired. Hearing and speech-impaired students often hesitate to pursue higher education in reputable institutions due to their challenges. However, the development of automated assistive learning tools within the educational field has empowered disabled students to pursue higher education in any field of study. Assistive learning devices enable students to access institutional resources and facilities fully. The proposed assistive learning and communication tool allows hearing and speech-impaired students to interact productively with their teachers and classmates. This tool converts the audio signals into sign language videos for the… More >

Open Access

ARTICLE

Cross-Modal Simplex Center Learning for Speech-Face Association

Qiming Ma, Fanliang Bu^*, Rong Wang, Lingbin Bu, Yifan Wang, Zhiyuan Li

CMC-Computers, Materials & Continua, Vol.82, No.3, pp. 5169-5184, 2025, DOI:10.32604/cmc.2025.061187 - 06 March 2025

Abstract Speech-face association aims to achieve identity matching between facial images and voice segments by aligning cross-modal features. Existing research primarily focuses on learning shared-space representations and computing one-to-one similarities between cross-modal sample pairs to establish their correlation. However, these approaches do not fully account for intra-class variations between the modalities or the many-to-many relationships among cross-modal samples, which are crucial for robust association modeling. To address these challenges, we propose a novel framework that leverages global information to align voice and face embeddings while effectively correlating identity information embedded in both modalities. First, we jointly… More >

Open Access

ARTICLE

E-SWAN: Efficient Sliding Window Analysis Network for Real-Time Speech Steganography Detection

Kening Wang^1,#, Feipeng Gao^2,#, Jie Yang^1,2,*, Hao Zhang¹

CMC-Computers, Materials & Continua, Vol.82, No.3, pp. 4797-4820, 2025, DOI:10.32604/cmc.2025.060042 - 06 March 2025

Abstract With the rapid advancement of Voice over Internet Protocol (VoIP) technology, speech steganography techniques such as Quantization Index Modulation (QIM) and Pitch Modulation Steganography (PMS) have emerged as significant challenges to information security. These techniques embed hidden information into speech streams, making detection increasingly difficult, particularly under conditions of low embedding rates and short speech durations. Existing steganalysis methods often struggle to balance detection accuracy and computational efficiency due to their limited ability to effectively capture both temporal and spatial features of speech signals. To address these challenges, this paper proposes an Efficient Sliding Window… More >

Displaying 1-10 on page 1 of 109. Per Page

View

938

Download

346

View

729

Download

270

View

434

Download

162

View

1292

Download

819

View

1107

Download

600

View

2031

Download

871

View

1876

Download

695

View

1253

Download

414

View

980

Download

1121

View

1194

Download

2736

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: