DeepPreNet: A Deep Learning Pre-Processing Method for Speech Distortion Correction in Parametric Array Loudspeaker

Wenyao Ma*, Yunxi Zhu*, Fengyuan Hao*, Liwen Qin*, Fengyi Fan*, Jun Yang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The parametric array loudspeaker produces highly directional sound via a nonlinear process in air, which also introduces inherent baseband distortions. However, conventional recursive modulators employed to compensate for nonlinearity demand substantially increased bandwidth and are not optimized for speech applications. In this paper, we propose a deep learning method tailored for speech, called DeepPreNet. It contains two parts: a pre-processing network (PreNet) and a forward inference model (ForwModel). The ForwModel is a pre-trained network using real recorded speeches to model the actual nonlinear process, enhancing its reliability for PreNet training. The PreNet is trained to generate pre-processed signals, which are subsequently fed into the ForwModel to recover the distortion-free speech. By leveraging the harmonic-rich feature of speech, the proposed method incorporates distortions to reconstruct clean speech, thereby alleviating the bandwidth constraints imposed by the transducer. Experiments in both near- and far-field conditions demonstrate that the proposed method achieves remarkable performance compared to refined baseline techniques with the real transducer response.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
EditorsBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350368741
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: Apr 6 2025Apr 11 2025

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period4/6/254/11/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

ASJC Scopus Subject Areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Keywords

  • deep learning
  • distortion correction
  • Parametric array loudspeaker
  • pre-processing network

Cite this