Three-stage hybrid neural beamformer for multi-channel speech enhancement

Kelan Kuang, Feiran Yang*, Junfeng Li, Jun Yang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

This paper proposes a hybrid neural beamformer for multi-channel speech enhancement, which comprises three stages, i.e., beamforming, post-filtering, and distortion compensation, called TriU-Net. The TriU-Net first estimates a set of masks to be used within a minimum variance distortionless response beamformer. A deep neural network (DNN)-based post-filter is then utilized to suppress the residual noise. Finally, a DNN-based distortion compensator is followed to further improve speech quality. To characterize the long-range temporal dependencies more efficiently, a network topology, gated convolutional attention network, is proposed and utilized in the TriU-Net. The advantage of the proposed model is that the speech distortion compensation is explicitly considered, yielding higher speech quality and intelligibility. The proposed model achieved an average 2.854 wb-PESQ score and 92.57% ESTOI on the CHiME-3 dataset. In addition, extensive experiments conducted on the synthetic data and real recordings confirm the effectiveness of the proposed method in noisy reverberant environments.

Original languageEnglish
Pages (from-to)3378-3389
Number of pages12
JournalJournal of the Acoustical Society of America
Volume153
Issue number6
DOIs
Publication statusPublished - Jun 1 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023 Acoustical Society of America.

ASJC Scopus Subject Areas

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

Cite this