Restoration of Bone-Conducted Speech with U-Net-Like Model and Energy Distance Loss

Changtao Li, Feiran Yang*, Jun Yang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

Bone-conducted speech is less susceptible to ambient noise interference, but it suffers from poor speech quality due to the limited bandwidth. In this letter, we propose a U-Net-like network for the restoration of bone-conducted speech in the time domain. The proposed network consists of residual-connected one-dimensional convolutions and shifted window-based attention modules, which can model long-Term dependencies crucial in speech processing. We find that the prevalent time-domain loss may be insufficient for the generation of high-frequency information absent in bone-conducted speech. To address this issue, we propose to utilize the generalized energy distance loss based on multi-scale Mel spectrograms as the objective function. Experimental results on the ESMB dataset validate the efficacy of our proposed method in restoration of bone-conducted speech. The proposed approach significantly outperforms two recent time-domain benchmarks, DPT-EGNet and EBEN, in terms of PESQ and STOI metrics.

Original languageEnglish
Pages (from-to)166-170
Number of pages5
JournalIEEE Signal Processing Letters
Volume31
DOIs
Publication statusPublished - 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1994-2012 IEEE.

ASJC Scopus Subject Areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Applied Mathematics

Keywords

  • attention
  • Bone-conducted speech
  • spectral energy distance
  • speech enhancement
  • speech synthesis

Cite this