Speech Separation Algorithm Using Gated Recurrent Network Based on Microphone Array

Xiaoyan Zhao; Lin Zhou; Yue Xie; Ying Tong; Jingang Shi

doi:10.32604/iasc.2023.030180

Open Access icon Open Access

ARTICLE

Speech Separation Algorithm Using Gated Recurrent Network Based on Microphone Array

Xiaoyan Zhao^1,*, Lin Zhou², Yue Xie¹, Ying Tong¹, Jingang Shi³

1 School of Information and Communication Engineering, Nanjing Institute of Technology, Nanjing, 211167, China
2 School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
3 University of Oulu, Oulu, 90014, FI, Finland

* Corresponding Author: Xiaoyan Zhao. Email: email

Intelligent Automation & Soft Computing 2023, 36(3), 3087-3100. https://doi.org/10.32604/iasc.2023.030180

Received 20 March 2022; Accepted 07 January 2023; Issue published 15 March 2023

Abstract

Speech separation is an active research topic that plays an important role in numerous applications, such as speaker recognition, hearing prosthesis, and autonomous robots. Many algorithms have been put forward to improve separation performance. However, speech separation in reverberant noisy environment is still a challenging task. To address this, a novel speech separation algorithm using gate recurrent unit (GRU) network based on microphone array has been proposed in this paper. The main aim of the proposed algorithm is to improve the separation performance and reduce the computational cost. The proposed algorithm extracts the sub-band steered response power-phase transform (SRP-PHAT) weighted by gammatone filter as the speech separation feature due to its discriminative and robust spatial position information. Since the GRU network has the advantage of processing time series data with faster training speed and fewer training parameters, the GRU model is adopted to process the separation features of several sequential frames in the same sub-band to estimate the ideal Ratio Masking (IRM). The proposed algorithm decomposes the mixture signals into time-frequency (TF) units using gammatone filter bank in the frequency domain, and the target speech is reconstructed in the frequency domain by masking the mixture signal according to the estimated IRM. The operations of decomposing the mixture signal and reconstructing the target signal are completed in the frequency domain which can reduce the total computational cost. Experimental results demonstrate that the proposed algorithm realizes omnidirectional speech separation in noisy and reverberant environments, provides good performance in terms of speech quality and intelligibility, and has the generalization capacity to reverberate.

Keywords

Microphone array; speech separation; gate recurrent unit network; gammatone sub-band steered response power-phase transform spatial spectrum

Cite This Article

APA Style

Zhao, X., Zhou, L., Xie, Y., Tong, Y., Shi, J. (2023). Speech separation algorithm using gated recurrent network based on microphone array. Intelligent Automation & Soft Computing, 36(3), 3087-3100. https://doi.org/10.32604/iasc.2023.030180

Vancouver Style

Zhao X, Zhou L, Xie Y, Tong Y, Shi J. Speech separation algorithm using gated recurrent network based on microphone array. Intell Automat Soft Comput . 2023;36(3):3087-3100 https://doi.org/10.32604/iasc.2023.030180

IEEE Style

X. Zhao, L. Zhou, Y. Xie, Y. Tong, and J. Shi "Speech Separation Algorithm Using Gated Recurrent Network Based on Microphone Array," Intell. Automat. Soft Comput. , vol. 36, no. 3, pp. 3087-3100. 2023. https://doi.org/10.32604/iasc.2023.030180

BibTex EndNote RIS

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Speech Separation Algorithm Using Gated Recurrent Network Based on Microphone Array

Abstract

Keywords

Cite This Article

797

580

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link