Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (109)
  • Open Access

    ARTICLE

    A Synthetic Speech Detection Model Combining Local-Global Dependency

    Jiahui Song, Yuepeng Zhang, Wenhao Yuan*

    CMC-Computers, Materials & Continua, Vol.86, No.1, pp. 1-15, 2026, DOI:10.32604/cmc.2025.069918 - 10 November 2025

    Abstract Synthetic speech detection is an essential task in the field of voice security, aimed at identifying deceptive voice attacks generated by text-to-speech (TTS) systems or voice conversion (VC) systems. In this paper, we propose a synthetic speech detection model called TFTransformer, which integrates both local and global features to enhance detection capabilities by effectively modeling local and global dependencies. Structurally, the model is divided into two main components: a front-end and a back-end. The front-end of the model uses a combination of SincLayer and two-dimensional (2D) convolution to extract high-level feature maps (HFM) containing local… More >

  • Open Access

    ARTICLE

    Mordukhovich Subdifferential Optimization Framework for Multi-Criteria Voice Cloning of Pathological Speech

    Rytis Maskeliūnas1, Robertas Damaševičius1,*, Audrius Kulikajevas1, Kipras Pribuišis2, Nora Ulozaitė-Stanienė2, Virgilijus Uloza2

    CMES-Computer Modeling in Engineering & Sciences, Vol.145, No.3, pp. 4203-4223, 2025, DOI:10.32604/cmes.2025.072790 - 23 December 2025

    Abstract This study introduces a novel voice cloning framework driven by Mordukhovich Subdifferential Optimization (MSO) to address the complex multi-objective challenges of pathological speech synthesis in under-resourced Lithuanian language with unique phonemes not present in most pre-trained models. Unlike existing voice synthesis models that often optimize for a single objective or are restricted to major languages, our approach explicitly balances four competing criteria: speech naturalness, speaker similarity, computational efficiency, and adaptability to pathological voice patterns. We evaluate four model configurations combining Lithuanian and English encoders, synthesizers, and vocoders. The hybrid model (English encoder, Lithuanian synthesizer, English More >

  • Open Access

    ARTICLE

    Using Hate Speech Detection Techniques to Prevent Violence and Foster Community Safety

    Ayaz Hussain1, Asad Hayat2, Muhammad Hasnain1,*

    Journal on Artificial Intelligence, Vol.7, pp. 485-498, 2025, DOI:10.32604/jai.2025.071933 - 17 November 2025

    Abstract Violent hate speech and scapegoating people against one another have emerged as a rising worldwide issue. But identifying and combating such content is crucial to create safer and more inclusive societies. The current study conducted research using Machine Learning models to classify hate speech and overcome the limitations posed in the existing detection techniques. Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbour (KNN) and Decision Tree were used on top of a publicly available hate speech dataset. The data was preprocessed by cleaning the text and tokenization and using normalization techniques to efficiently train the… More >

  • Open Access

    ARTICLE

    A Black-Box Speech Adversarial Attack Method Based on Enhanced Neural Predictors in Industrial IoT

    Yun Zhang, Zhenhua Yu*, Xufei Hu, Xuya Cong, Ou Ye

    CMC-Computers, Materials & Continua, Vol.84, No.3, pp. 5403-5426, 2025, DOI:10.32604/cmc.2025.067120 - 30 July 2025

    Abstract Devices in Industrial Internet of Things are vulnerable to voice adversarial attacks. Studying adversarial speech samples is crucial for enhancing the security of automatic speech recognition systems in Industrial Internet of Things devices. Current black-box attack methods often face challenges such as complex search processes and excessive perturbation generation. To address these issues, this paper proposes a black-box voice adversarial attack method based on enhanced neural predictors. This method searches for minimal perturbations in the perturbation space, employing an optimization process guided by a self-attention neural predictor to identify the optimal perturbation direction. This direction… More >

  • Open Access

    ARTICLE

    Enhancing Phoneme Labeling in Dysarthric Speech with Digital Twin-Driven Multi-Modal Architecture

    Saeed Alzahrani1, Nazar Hussain2, Farah Mohammad3,*

    CMC-Computers, Materials & Continua, Vol.84, No.3, pp. 4825-4849, 2025, DOI:10.32604/cmc.2025.066322 - 30 July 2025

    Abstract Digital twin technology is revolutionizing personalized healthcare by creating dynamic virtual replicas of individual patients. This paper presents a novel multi-modal architecture leveraging digital twins to enhance precision in predictive diagnostics and treatment planning of phoneme labeling. By integrating real-time images, electronic health records, and genomic information, the system enables personalized simulations for disease progression modeling, treatment response prediction, and preventive care strategies. In dysarthric speech, which is characterized by articulation imprecision, temporal misalignments, and phoneme distortions, existing models struggle to capture these irregularities. Traditional approaches, often relying solely on audio features, fail to address… More >

  • Open Access

    ARTICLE

    Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation

    Chiara Innocente1,*, Matteo Boemio2, Gianmarco Lorenzetti2, Ilaria Pulito2, Diego Romagnoli2, Valeria Saponaro2, Giorgia Marullo1, Luca Ulrich1, Enrico Vezzetti1

    CMES-Computer Modeling in Engineering & Sciences, Vol.143, No.2, pp. 1355-1379, 2025, DOI:10.32604/cmes.2025.063186 - 30 May 2025

    Abstract Lip-reading technology, based on visual speech decoding and automatic speech recognition, offers a promising solution to overcoming communication barriers, particularly for individuals with temporary or permanent speech impairments. However, most Visual Speech Recognition (VSR) research has primarily focused on the English language and general-purpose applications, limiting its practical applicability in medical and rehabilitative settings. This study introduces the first Deep Learning (DL) based lip-reading system for the Italian language designed to assist individuals with vocal cord pathologies in daily interactions, facilitating communication for patients recovering from vocal cord surgeries, whether temporarily or permanently impaired. To… More >

  • Open Access

    ARTICLE

    Integrating Speech-to-Text for Image Generation Using Generative Adversarial Networks

    Smita Mahajan1, Shilpa Gite1,2, Biswajeet Pradhan3,*, Abdullah Alamri4, Shaunak Inamdar5, Deva Shriyansh5, Akshat Ashish Shah5, Shruti Agarwal5

    CMES-Computer Modeling in Engineering & Sciences, Vol.143, No.2, pp. 2001-2026, 2025, DOI:10.32604/cmes.2025.058456 - 30 May 2025

    Abstract The development of generative architectures has resulted in numerous novel deep-learning models that generate images using text inputs. However, humans naturally use speech for visualization prompts. Therefore, this paper proposes an architecture that integrates speech prompts as input to image-generation Generative Adversarial Networks (GANs) model, leveraging Speech-to-Text translation along with the CLIP + VQGAN model. The proposed method involves translating speech prompts into text, which is then used by the Contrastive Language-Image Pretraining (CLIP) + Vector Quantized Generative Adversarial Network (VQGAN) model to generate images. This paper outlines the steps required to implement such a… More >

  • Open Access

    ARTICLE

    ALCTS—An Assistive Learning and Communicative Tool for Speech and Hearing Impaired Students

    Shabana Ziyad Puthu Vedu1,*, Wafaa A. Ghonaim2, Naglaa M. Mostafa3, Pradeep Kumar Singh4

    CMC-Computers, Materials & Continua, Vol.83, No.2, pp. 2599-2617, 2025, DOI:10.32604/cmc.2025.062695 - 16 April 2025

    Abstract Hearing and Speech impairment can be congenital or acquired. Hearing and speech-impaired students often hesitate to pursue higher education in reputable institutions due to their challenges. However, the development of automated assistive learning tools within the educational field has empowered disabled students to pursue higher education in any field of study. Assistive learning devices enable students to access institutional resources and facilities fully. The proposed assistive learning and communication tool allows hearing and speech-impaired students to interact productively with their teachers and classmates. This tool converts the audio signals into sign language videos for the… More >

  • Open Access

    ARTICLE

    Cross-Modal Simplex Center Learning for Speech-Face Association

    Qiming Ma, Fanliang Bu*, Rong Wang, Lingbin Bu, Yifan Wang, Zhiyuan Li

    CMC-Computers, Materials & Continua, Vol.82, No.3, pp. 5169-5184, 2025, DOI:10.32604/cmc.2025.061187 - 06 March 2025

    Abstract Speech-face association aims to achieve identity matching between facial images and voice segments by aligning cross-modal features. Existing research primarily focuses on learning shared-space representations and computing one-to-one similarities between cross-modal sample pairs to establish their correlation. However, these approaches do not fully account for intra-class variations between the modalities or the many-to-many relationships among cross-modal samples, which are crucial for robust association modeling. To address these challenges, we propose a novel framework that leverages global information to align voice and face embeddings while effectively correlating identity information embedded in both modalities. First, we jointly… More >

  • Open Access

    ARTICLE

    E-SWAN: Efficient Sliding Window Analysis Network for Real-Time Speech Steganography Detection

    Kening Wang1,#, Feipeng Gao2,#, Jie Yang1,2,*, Hao Zhang1

    CMC-Computers, Materials & Continua, Vol.82, No.3, pp. 4797-4820, 2025, DOI:10.32604/cmc.2025.060042 - 06 March 2025

    Abstract With the rapid advancement of Voice over Internet Protocol (VoIP) technology, speech steganography techniques such as Quantization Index Modulation (QIM) and Pitch Modulation Steganography (PMS) have emerged as significant challenges to information security. These techniques embed hidden information into speech streams, making detection increasingly difficult, particularly under conditions of low embedding rates and short speech durations. Existing steganalysis methods often struggle to balance detection accuracy and computational efficiency due to their limited ability to effectively capture both temporal and spatial features of speech signals. To address these challenges, this paper proposes an Efficient Sliding Window… More >

Displaying 1-10 on page 1 of 109. Per Page