Akmalbek Abdusalomov1, Mukhriddin Mukhiddinov2,3, Fakhriddin Abdirazakov4, Alpamis Kutlimuratov5, Nodira Alimova6, Ilyos Kalandarov7, Ayhan Istanbullu8, Rashid Nasimov9, Young-Im Cho1,*
CMC-Computers, Materials & Continua, Vol.88, No.1, 2026, DOI:10.32604/cmc.2026.080444
- 08 May 2026
Abstract Recent advancements in AI-synthesized speech have resulted in highly realistic deepfake audio, posing severe threats to authentication systems and digital media trust. Existing detection models struggle to generalize across diverse synthesis methods, especially those involving neural codec-based Audio Language Models (ALMs). In this work, we propose UniTector++, a novel prosody-aware, multi-stream detection architecture that generalizes across vocoder- and codec-based synthesis. UniTector++ incorporates three complementary streams—Whisper-based semantic embeddings, high-level prosodic features, and codec artifact representations—fused through a Multi-Domain Adaptive Graph Attention Fusion (MAGAF) module. Furthermore, an Emotion-Consistency Verification Module (ECVM) reinforces alignment between speech style and More >