A Synthetic Speech Detection Model Combining Local-Global Dependency

Jiahui Song; Yuepeng Zhang; Wenhao Yuan

doi:10.32604/cmc.2025.069918

Open Access icon Open Access

ARTICLE

A Synthetic Speech Detection Model Combining Local-Global Dependency

Jiahui Song, Yuepeng Zhang, Wenhao Yuan^*

School of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China

* Corresponding Author: Wenhao Yuan. Email: email

Computers, Materials & Continua 2026, 86(1), 1-15. https://doi.org/10.32604/cmc.2025.069918

Received 03 July 2025; Accepted 01 September 2025; Issue published 10 November 2025

Abstract

Synthetic speech detection is an essential task in the field of voice security, aimed at identifying deceptive voice attacks generated by text-to-speech (TTS) systems or voice conversion (VC) systems. In this paper, we propose a synthetic speech detection model called TFTransformer, which integrates both local and global features to enhance detection capabilities by effectively modeling local and global dependencies. Structurally, the model is divided into two main components: a front-end and a back-end. The front-end of the model uses a combination of SincLayer and two-dimensional (2D) convolution to extract high-level feature maps (HFM) containing local dependency of the input speech signals. The back-end uses time-frequency Transformer module to process these feature maps and further capture global dependency. Furthermore, we propose TFTransformer-SE, which incorporates a channel attention mechanism within the 2D convolutional blocks. This enhancement aims to more effectively capture local dependencies, thereby improving the model’s performance. The experiments were conducted on the ASVspoof 2021 LA dataset, and the results showed that the model achieved an equal error rate (EER) of 3.37% without data augmentation. Additionally, we evaluated the model using the ASVspoof 2019 LA dataset, achieving an EER of 0.84%, also without data augmentation. This demonstrates that combining local and global dependencies in the time-frequency domain can significantly improve detection accuracy.

Keywords

Synthetic speech detection; transformer; local-global; time-frequency domain

Cite This Article

APA Style

Song, J., Zhang, Y., Yuan, W. (2026). A Synthetic Speech Detection Model Combining Local-Global Dependency. Computers, Materials & Continua, 86(1), 1–15. https://doi.org/10.32604/cmc.2025.069918

Vancouver Style

Song J, Zhang Y, Yuan W. A Synthetic Speech Detection Model Combining Local-Global Dependency. Comput Mater Contin. 2026;86(1):1–15. https://doi.org/10.32604/cmc.2025.069918

IEEE Style

J. Song, Y. Zhang, and W. Yuan, “A Synthetic Speech Detection Model Combining Local-Global Dependency,” Comput. Mater. Contin., vol. 86, no. 1, pp. 1–15, 2026. https://doi.org/10.32604/cmc.2025.069918

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Synthetic Speech Detection Model Combining Local-Global Dependency

Abstract

Keywords

Cite This Article

1177

429

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link