Open Access iconOpen Access

ARTICLE

E-SWAN: Efficient Sliding Window Analysis Network for Real-Time Speech Steganography Detection

Kening Wang1,#, Feipeng Gao2,#, Jie Yang1,2,*, Hao Zhang1

1 School of Engineering and Technology, Jiyang College of Zhejiang A&F University, Zhuji, 311800, China
2 College of Mathematics and Computer Science, Zhejiang A&F University, Hangzhou, 311300, China

* Corresponding Author: Jie Yang. Email: email
# Kening Wang and Feipeng Gao contributed equally to this work

Computers, Materials & Continua 2025, 82(3), 4797-4820. https://doi.org/10.32604/cmc.2025.060042

Abstract

With the rapid advancement of Voice over Internet Protocol (VoIP) technology, speech steganography techniques such as Quantization Index Modulation (QIM) and Pitch Modulation Steganography (PMS) have emerged as significant challenges to information security. These techniques embed hidden information into speech streams, making detection increasingly difficult, particularly under conditions of low embedding rates and short speech durations. Existing steganalysis methods often struggle to balance detection accuracy and computational efficiency due to their limited ability to effectively capture both temporal and spatial features of speech signals. To address these challenges, this paper proposes an Efficient Sliding Window Analysis Network (E-SWAN), a novel deep learning model specifically designed for real-time speech steganalysis. E-SWAN integrates two core modules: the LSTM Temporal Feature Miner (LTFM) and the Convolutional Key Feature Miner (CKFM). LTFM captures long-range temporal dependencies using Long Short-Term Memory networks, while CKFM identifies local spatial variations caused by steganographic embedding through convolutional operations. These modules operate within a sliding window framework, enabling efficient extraction of temporal and spatial features. Experimental results on the Chinese CNV and PMS datasets demonstrate the superior performance of E-SWAN. Under conditions of a ten-second sample duration and an embedding rate of 10%, E-SWAN achieves a detection accuracy of 62.09% on the PMS dataset, surpassing existing methods by 4.57%, and an accuracy of 82.28% on the CNV dataset, outperforming state-of-the-art methods by 7.29%. These findings validate the robustness and efficiency of E-SWAN under low embedding rates and short durations, offering a promising solution for real-time VoIP steganalysis. This work provides significant contributions to enhancing information security in digital communications.

Keywords

Steganalysis; speech; convolutional sliding window; deep learning

Cite This Article

APA Style
Wang, K., Gao, F., Yang, J., Zhang, H. (2025). E-SWAN: efficient sliding window analysis network for real-time speech steganography detection. Computers, Materials & Continua, 82(3), 4797–4820. https://doi.org/10.32604/cmc.2025.060042
Vancouver Style
Wang K, Gao F, Yang J, Zhang H. E-SWAN: efficient sliding window analysis network for real-time speech steganography detection. Comput Mater Contin. 2025;82(3):4797–4820. https://doi.org/10.32604/cmc.2025.060042
IEEE Style
K. Wang, F. Gao, J. Yang, and H. Zhang, “E-SWAN: Efficient Sliding Window Analysis Network for Real-Time Speech Steganography Detection,” Comput. Mater. Contin., vol. 82, no. 3, pp. 4797–4820, 2025. https://doi.org/10.32604/cmc.2025.060042



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 310

    View

  • 150

    Download

  • 0

    Like

Share Link