Abstract
This paper proposes a hybrid neural beamformer for multi-channel speech enhancement, which comprises three stages, i.e., beamforming, post-filtering, and distortion compensation, called TriU-Net. The TriU-Net first estimates a set of masks to be used within a minimum variance distortionless response beamformer. A deep neural network (DNN)-based post-filter is then utilized to suppress the residual noise. Finally, a DNN-based distortion compensator is followed to further improve speech quality. To characterize the long-range temporal dependencies more efficiently, a network topology, gated convolutional attention network, is proposed and utilized in the TriU-Net. The advantage of the proposed model is that the speech distortion compensation is explicitly considered, yielding higher speech quality and intelligibility. The proposed model achieved an average 2.854 wb-PESQ score and 92.57% ESTOI on the CHiME-3 dataset. In addition, extensive experiments conducted on the synthetic data and real recordings confirm the effectiveness of the proposed method in noisy reverberant environments.
Original language | English |
---|---|
Pages (from-to) | 3378-3389 |
Number of pages | 12 |
Journal | Journal of the Acoustical Society of America |
Volume | 153 |
Issue number | 6 |
DOIs | |
Publication status | Published - Jun 1 2023 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2023 Acoustical Society of America.
ASJC Scopus Subject Areas
- Arts and Humanities (miscellaneous)
- Acoustics and Ultrasonics