神经网络辅助估计先验语音存在概率的多通道降噪方法

Translated title of the contribution: NN-Supported a Priori Speech Presence Probability Estimation for Multichannel Noise Reduction

Jing Lei, Jinfu Wang, Feiran Yang, Jun Yang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The estimation of the noise power spectral density matrix is crucial in beamforming-based multichannel noise reduction methods. The multichannel speech presence probability(MCSPP)can be used to continuously control the adaptation of the noise power spectral density matrix. Accordingly,the estimation accuracy of the noise power spectral density matrix is directly related to the accuracy of the speech presence probability estimation. Traditional techniques for estimating speech presence probability are based on the assumption of stationary noise. However,they frequently encounter a parameter trailing issue when dealing with non-stationary noise,leading to diminished noise suppression in practical applications. In this study,we first theoretically explain the rationale for the trailing problem in traditional methods for speech presence probability estimation. Speech presence probability is linearly related to the long-term signal-to-noise ratio(SNR)in traditional methods. Furthermore,we found that the long-term SNR of the current frame is only a small attenuation of the long-term SNR of the last frame when speech exists. When noise changes rapidly,the long-term SNR changes slowly,resulting in estimation trailing problem in the estimated speech presence probability. To address this problem,we proposed using the temporal convolutional network(TCN)to estimate the a priori speech presence probability. Furthermore,by integrating the estimated a priori speech presence probability into the MCSPP framework,we achieve a more accurate estimation of the posterior speech presence probability. TCN can directly estimate speech presence probability without relying on the noise stationary assumption,and the trailing problem can be effectively avoided. Therefore,a priori speech presence probability estimated by TCN can improve the accuracy of the noise power spectral density matrix estimation with non-stationary noise. The performance of the different methods was assessed using the CHiME-3 dataset. Simulation results demonstrate that the proposed method outperforms other methods in terms of noise reduction and speech quality in non-stationary noise environments. Specifically,the proposed method achieved a PESQ improvement of 0.09,a fwSegSNR improvement of 0.78,and a COVL improvement of 0.08 over the traditional method on the test dataset with an SNR of 5 dB.

Translated title of the contributionNN-Supported a Priori Speech Presence Probability Estimation for Multichannel Noise Reduction
Original languageChinese (Simplified)
Pages (from-to)1197-1207
Number of pages11
JournalJournal of Signal Processing
Volume40
Issue number7
DOIs
Publication statusPublished - Jul 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2024 Editorial Board of Journal of Signal Processing. All rights reserved.

ASJC Scopus Subject Areas

  • Signal Processing

Keywords

  • multichannel noise reduction
  • neural network
  • speech presence probability

Cite this