神经网络辅助估计先验语音存在概率的多通道降噪方法

Jing Lei; Jinfu Wang; Feiran Yang; Jun Yang

doi:10.16798/j.issn.1003-0530.2024.07.002

神经网络辅助估计先验语音存在概率的多通道降噪方法

Translated title of the contribution: NN-Supported a Priori Speech Presence Probability Estimation for Multichannel Noise Reduction

Jing Lei, Jinfu Wang, Feiran Yang, Jun Yang^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

The estimation of the noise power spectral density matrix is crucial in beamforming-based multichannel noise reduction methods. The multichannel speech presence probability（MCSPP）can be used to continuously control the adaptation of the noise power spectral density matrix. Accordingly，the estimation accuracy of the noise power spectral density matrix is directly related to the accuracy of the speech presence probability estimation. Traditional techniques for estimating speech presence probability are based on the assumption of stationary noise. However，they frequently encounter a parameter trailing issue when dealing with non-stationary noise，leading to diminished noise suppression in practical applications. In this study，we first theoretically explain the rationale for the trailing problem in traditional methods for speech presence probability estimation. Speech presence probability is linearly related to the long-term signal-to-noise ratio（SNR）in traditional methods. Furthermore，we found that the long-term SNR of the current frame is only a small attenuation of the long-term SNR of the last frame when speech exists. When noise changes rapidly，the long-term SNR changes slowly，resulting in estimation trailing problem in the estimated speech presence probability. To address this problem，we proposed using the temporal convolutional network（TCN）to estimate the a priori speech presence probability. Furthermore，by integrating the estimated a priori speech presence probability into the MCSPP framework，we achieve a more accurate estimation of the posterior speech presence probability. TCN can directly estimate speech presence probability without relying on the noise stationary assumption，and the trailing problem can be effectively avoided. Therefore，a priori speech presence probability estimated by TCN can improve the accuracy of the noise power spectral density matrix estimation with non-stationary noise. The performance of the different methods was assessed using the CHiME-3 dataset. Simulation results demonstrate that the proposed method outperforms other methods in terms of noise reduction and speech quality in non-stationary noise environments. Specifically，the proposed method achieved a PESQ improvement of 0.09，a fwSegSNR improvement of 0.78，and a COVL improvement of 0.08 over the traditional method on the test dataset with an SNR of 5 dB.

Translated title of the contribution	NN-Supported a Priori Speech Presence Probability Estimation for Multichannel Noise Reduction
Original language	Chinese (Simplified)
Pages (from-to)	1197-1207
Number of pages	11
Journal	Journal of Signal Processing
Volume	40
Issue number	7
DOIs	https://doi.org/10.16798/j.issn.1003-0530.2024.07.002
Publication status	Published - Jul 2024
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2024 Editorial Board of Journal of Signal Processing. All rights reserved.

ASJC Scopus Subject Areas

Signal Processing

Keywords

multichannel noise reduction
neural network
speech presence probability

Access to Document

10.16798/j.issn.1003-0530.2024.07.002

Cite this

@article{b66ddc42bfb24be995627ed80ae129a8,

title = "神经网络辅助估计先验语音存在概率的多通道降噪方法",

abstract = "The estimation of the noise power spectral density matrix is crucial in beamforming-based multichannel noise reduction methods. The multichannel speech presence probability（MCSPP）can be used to continuously control the adaptation of the noise power spectral density matrix. Accordingly，the estimation accuracy of the noise power spectral density matrix is directly related to the accuracy of the speech presence probability estimation. Traditional techniques for estimating speech presence probability are based on the assumption of stationary noise. However，they frequently encounter a parameter trailing issue when dealing with non-stationary noise，leading to diminished noise suppression in practical applications. In this study，we first theoretically explain the rationale for the trailing problem in traditional methods for speech presence probability estimation. Speech presence probability is linearly related to the long-term signal-to-noise ratio（SNR）in traditional methods. Furthermore，we found that the long-term SNR of the current frame is only a small attenuation of the long-term SNR of the last frame when speech exists. When noise changes rapidly，the long-term SNR changes slowly，resulting in estimation trailing problem in the estimated speech presence probability. To address this problem，we proposed using the temporal convolutional network（TCN）to estimate the a priori speech presence probability. Furthermore，by integrating the estimated a priori speech presence probability into the MCSPP framework，we achieve a more accurate estimation of the posterior speech presence probability. TCN can directly estimate speech presence probability without relying on the noise stationary assumption，and the trailing problem can be effectively avoided. Therefore，a priori speech presence probability estimated by TCN can improve the accuracy of the noise power spectral density matrix estimation with non-stationary noise. The performance of the different methods was assessed using the CHiME-3 dataset. Simulation results demonstrate that the proposed method outperforms other methods in terms of noise reduction and speech quality in non-stationary noise environments. Specifically，the proposed method achieved a PESQ improvement of 0.09，a fwSegSNR improvement of 0.78，and a COVL improvement of 0.08 over the traditional method on the test dataset with an SNR of 5 dB.",

keywords = "multichannel noise reduction, neural network, speech presence probability",

author = "Jing Lei and Jinfu Wang and Feiran Yang and Jun Yang",

year = "2024",

month = jul,

doi = "10.16798/j.issn.1003-0530.2024.07.002",

language = "Chinese (Simplified)",

volume = "40",

pages = "1197--1207",

journal = "Journal of Signal Processing",

issn = "1003-0530",

number = "7",

}

TY - JOUR

T1 - 神经网络辅助估计先验语音存在概率的多通道降噪方法

AU - Lei, Jing

AU - Wang, Jinfu

AU - Yang, Feiran

AU - Yang, Jun

PY - 2024/7

Y1 - 2024/7

N2 - The estimation of the noise power spectral density matrix is crucial in beamforming-based multichannel noise reduction methods. The multichannel speech presence probability（MCSPP）can be used to continuously control the adaptation of the noise power spectral density matrix. Accordingly，the estimation accuracy of the noise power spectral density matrix is directly related to the accuracy of the speech presence probability estimation. Traditional techniques for estimating speech presence probability are based on the assumption of stationary noise. However，they frequently encounter a parameter trailing issue when dealing with non-stationary noise，leading to diminished noise suppression in practical applications. In this study，we first theoretically explain the rationale for the trailing problem in traditional methods for speech presence probability estimation. Speech presence probability is linearly related to the long-term signal-to-noise ratio（SNR）in traditional methods. Furthermore，we found that the long-term SNR of the current frame is only a small attenuation of the long-term SNR of the last frame when speech exists. When noise changes rapidly，the long-term SNR changes slowly，resulting in estimation trailing problem in the estimated speech presence probability. To address this problem，we proposed using the temporal convolutional network（TCN）to estimate the a priori speech presence probability. Furthermore，by integrating the estimated a priori speech presence probability into the MCSPP framework，we achieve a more accurate estimation of the posterior speech presence probability. TCN can directly estimate speech presence probability without relying on the noise stationary assumption，and the trailing problem can be effectively avoided. Therefore，a priori speech presence probability estimated by TCN can improve the accuracy of the noise power spectral density matrix estimation with non-stationary noise. The performance of the different methods was assessed using the CHiME-3 dataset. Simulation results demonstrate that the proposed method outperforms other methods in terms of noise reduction and speech quality in non-stationary noise environments. Specifically，the proposed method achieved a PESQ improvement of 0.09，a fwSegSNR improvement of 0.78，and a COVL improvement of 0.08 over the traditional method on the test dataset with an SNR of 5 dB.

AB - The estimation of the noise power spectral density matrix is crucial in beamforming-based multichannel noise reduction methods. The multichannel speech presence probability（MCSPP）can be used to continuously control the adaptation of the noise power spectral density matrix. Accordingly，the estimation accuracy of the noise power spectral density matrix is directly related to the accuracy of the speech presence probability estimation. Traditional techniques for estimating speech presence probability are based on the assumption of stationary noise. However，they frequently encounter a parameter trailing issue when dealing with non-stationary noise，leading to diminished noise suppression in practical applications. In this study，we first theoretically explain the rationale for the trailing problem in traditional methods for speech presence probability estimation. Speech presence probability is linearly related to the long-term signal-to-noise ratio（SNR）in traditional methods. Furthermore，we found that the long-term SNR of the current frame is only a small attenuation of the long-term SNR of the last frame when speech exists. When noise changes rapidly，the long-term SNR changes slowly，resulting in estimation trailing problem in the estimated speech presence probability. To address this problem，we proposed using the temporal convolutional network（TCN）to estimate the a priori speech presence probability. Furthermore，by integrating the estimated a priori speech presence probability into the MCSPP framework，we achieve a more accurate estimation of the posterior speech presence probability. TCN can directly estimate speech presence probability without relying on the noise stationary assumption，and the trailing problem can be effectively avoided. Therefore，a priori speech presence probability estimated by TCN can improve the accuracy of the noise power spectral density matrix estimation with non-stationary noise. The performance of the different methods was assessed using the CHiME-3 dataset. Simulation results demonstrate that the proposed method outperforms other methods in terms of noise reduction and speech quality in non-stationary noise environments. Specifically，the proposed method achieved a PESQ improvement of 0.09，a fwSegSNR improvement of 0.78，and a COVL improvement of 0.08 over the traditional method on the test dataset with an SNR of 5 dB.

KW - multichannel noise reduction

KW - neural network

KW - speech presence probability

UR - http://www.scopus.com/inward/record.url?scp=85204249152&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85204249152&partnerID=8YFLogxK

U2 - 10.16798/j.issn.1003-0530.2024.07.002

DO - 10.16798/j.issn.1003-0530.2024.07.002

M3 - Article

AN - SCOPUS:85204249152

SN - 1003-0530

VL - 40

SP - 1197

EP - 1207

JO - Journal of Signal Processing

JF - Journal of Signal Processing

IS - 7

ER -

神经网络辅助估计先验语音存在概率的多通道降噪方法

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Cite this