Abstract
The estimation of the noise power spectral density matrix is crucial in beamforming-based multichannel noise reduction methods. The multichannel speech presence probability(MCSPP)can be used to continuously control the adaptation of the noise power spectral density matrix. Accordingly,the estimation accuracy of the noise power spectral density matrix is directly related to the accuracy of the speech presence probability estimation. Traditional techniques for estimating speech presence probability are based on the assumption of stationary noise. However,they frequently encounter a parameter trailing issue when dealing with non-stationary noise,leading to diminished noise suppression in practical applications. In this study,we first theoretically explain the rationale for the trailing problem in traditional methods for speech presence probability estimation. Speech presence probability is linearly related to the long-term signal-to-noise ratio(SNR)in traditional methods. Furthermore,we found that the long-term SNR of the current frame is only a small attenuation of the long-term SNR of the last frame when speech exists. When noise changes rapidly,the long-term SNR changes slowly,resulting in estimation trailing problem in the estimated speech presence probability. To address this problem,we proposed using the temporal convolutional network(TCN)to estimate the a priori speech presence probability. Furthermore,by integrating the estimated a priori speech presence probability into the MCSPP framework,we achieve a more accurate estimation of the posterior speech presence probability. TCN can directly estimate speech presence probability without relying on the noise stationary assumption,and the trailing problem can be effectively avoided. Therefore,a priori speech presence probability estimated by TCN can improve the accuracy of the noise power spectral density matrix estimation with non-stationary noise. The performance of the different methods was assessed using the CHiME-3 dataset. Simulation results demonstrate that the proposed method outperforms other methods in terms of noise reduction and speech quality in non-stationary noise environments. Specifically,the proposed method achieved a PESQ improvement of 0.09,a fwSegSNR improvement of 0.78,and a COVL improvement of 0.08 over the traditional method on the test dataset with an SNR of 5 dB.
Translated title of the contribution | NN-Supported a Priori Speech Presence Probability Estimation for Multichannel Noise Reduction |
---|---|
Original language | Chinese (Simplified) |
Pages (from-to) | 1197-1207 |
Number of pages | 11 |
Journal | Journal of Signal Processing |
Volume | 40 |
Issue number | 7 |
DOIs | |
Publication status | Published - Jul 2024 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2024 Editorial Board of Journal of Signal Processing. All rights reserved.
ASJC Scopus Subject Areas
- Signal Processing
Keywords
- multichannel noise reduction
- neural network
- speech presence probability