Abstract
We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the quality of bandwidth extended signals and reduce the sensitivity with respect to downsampling methods. Experiment results on the VCTK dataset show that the proposed method outperforms several recent baselines in both intrusive and non-intrusive metrics. Pretraining and filter augmentation also help stabilize and enhance the overall performance.
Original language | English |
---|---|
Title of host publication | 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 161-165 |
Number of pages | 5 |
ISBN (Electronic) | 9781665405409 |
DOIs | |
Publication status | Published - 2022 |
Externally published | Yes |
Event | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore Duration: May 23 2022 → May 27 2022 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
Volume | 2022-May |
ISSN (Print) | 1520-6149 |
Conference
Conference | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 |
---|---|
Country/Territory | Singapore |
City | Virtual, Online |
Period | 5/23/22 → 5/27/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE
ASJC Scopus Subject Areas
- Software
- Signal Processing
- Electrical and Electronic Engineering
Keywords
- Bandwidth extension
- self-supervised pretraining
- speech enhancement
- transformer