A Prosody-Guided Multi-Stream Framework for Universal Detection of AI-Synthesized Speech across Codec and Vocoder Domains

Akmalbek Abdusalomov; Mukhriddin Mukhiddinov; Fakhriddin Abdirazakov; Alpamis Kutlimuratov; Nodira Alimova; Ilyos Kalandarov; Ayhan Istanbullu; Rashid Nasimov; Young-Im Cho

doi:10.32604/cmc.2026.080444

Open Access icon Open Access

ARTICLE

A Prosody-Guided Multi-Stream Framework for Universal Detection of AI-Synthesized Speech across Codec and Vocoder Domains

Akmalbek Abdusalomov¹, Mukhriddin Mukhiddinov^2,3, Fakhriddin Abdirazakov⁴, Alpamis Kutlimuratov⁵, Nodira Alimova⁶, Ilyos Kalandarov⁷, Ayhan Istanbullu⁸, Rashid Nasimov⁹, Young-Im Cho^1,*

1 Department of Computer Engineering, Gachon University, Seongnam-si, Republic of Korea
2 Department of Industrial Management and Digital Technologies, Nordic International University, Tashkent, Uzbekistan
3 Department of Artificial Intelligence, Tashkent University of Information Technologies Named after Muhammad Al-Khwarizmi, Tashkent, Uzbekistan
4 Department of Computer Systems, Tashkent University of Information Technologies Named after Muhammad Al-Khwarizmi, Tashkent, Uzbekistan
5 Department of Applied Informatics, Kimyo International University in Tashkent, Tashkent, Uzbekistan
6 Department of Information Processing and Control Systems, Tashkent State Technical University, Tashkent, Uzbekistan
7 Department of Automation and Control, Navoi State University of Mining and Technologies, Navoi, Uzbekistan
8 Department of Computer Engineering, Faculty of Engineering, Balikesir University, Balikesir, Turkey
9 Department of Artificial Intelligence, Tashkent State University of Economics, Tashkent, Uzbekistan

* Corresponding Author: Young-Im Cho. Email: email

Computers, Materials & Continua 2026, 88(1), 98 https://doi.org/10.32604/cmc.2026.080444

Received 09 February 2026; Accepted 15 April 2026; Issue published 08 May 2026

Abstract

Recent advancements in AI-synthesized speech have resulted in highly realistic deepfake audio, posing severe threats to authentication systems and digital media trust. Existing detection models struggle to generalize across diverse synthesis methods, especially those involving neural codec-based Audio Language Models (ALMs). In this work, we propose UniTector++, a novel prosody-aware, multi-stream detection architecture that generalizes across vocoder- and codec-based synthesis. UniTector++ incorporates three complementary streams—Whisper-based semantic embeddings, high-level prosodic features, and codec artifact representations—fused through a Multi-Domain Adaptive Graph Attention Fusion (MAGAF) module. Furthermore, an Emotion-Consistency Verification Module (ECVM) reinforces alignment between speech style and prosodic content, and a Universal Adversarial Robustness (UAR) head improves resistance against adversarial attacks. Evaluated on three benchmark datasets—ASVspoof2021, PolyFake, and Codecfake—UniTector++ achieves state-of-the-art performance with average Equal Error Rate (EER) of 0.57% under unseen synthesis scenarios, outperforming competitive baselines by a relative margin of 28%. Our results demonstrate the model’s superior generalization, interpretability, and robustness, offering a significant advancement in universal deepfake speech detection.

Keywords

Deepfake speech detection; prosody analysis; neural codec artifacts; whisper model; multi-stream fusion; emotion-consistency verification; AI-synthesized speech; spoofing detection

Cite This Article

APA Style

Abdusalomov, A., Mukhiddinov, M., Abdirazakov, F., Kutlimuratov, A., Alimova, N. et al. (2026). A Prosody-Guided Multi-Stream Framework for Universal Detection of AI-Synthesized Speech across Codec and Vocoder Domains. Computers, Materials & Continua, 88(1), 98. https://doi.org/10.32604/cmc.2026.080444

Vancouver Style

Abdusalomov A, Mukhiddinov M, Abdirazakov F, Kutlimuratov A, Alimova N, Kalandarov I, et al. A Prosody-Guided Multi-Stream Framework for Universal Detection of AI-Synthesized Speech across Codec and Vocoder Domains. Comput Mater Contin. 2026;88(1):98. https://doi.org/10.32604/cmc.2026.080444

IEEE Style

A. Abdusalomov et al., “A Prosody-Guided Multi-Stream Framework for Universal Detection of AI-Synthesized Speech across Codec and Vocoder Domains,” Comput. Mater. Contin., vol. 88, no. 1, pp. 98, 2026. https://doi.org/10.32604/cmc.2026.080444

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Prosody-Guided Multi-Stream Framework for Universal Detection of AI-Synthesized Speech across Codec and Vocoder Domains

Abstract

Keywords

Cite This Article

634

172

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link