Abstract
The parametric array loudspeaker produces highly directional sound via a nonlinear process in air, which also introduces inherent baseband distortions. However, conventional recursive modulators employed to compensate for nonlinearity demand substantially increased bandwidth and are not optimized for speech applications. In this paper, we propose a deep learning method tailored for speech, called DeepPreNet. It contains two parts: a pre-processing network (PreNet) and a forward inference model (ForwModel). The ForwModel is a pre-trained network using real recorded speeches to model the actual nonlinear process, enhancing its reliability for PreNet training. The PreNet is trained to generate pre-processed signals, which are subsequently fed into the ForwModel to recover the distortion-free speech. By leveraging the harmonic-rich feature of speech, the proposed method incorporates distortions to reconstruct clean speech, thereby alleviating the bandwidth constraints imposed by the transducer. Experiments in both near- and far-field conditions demonstrate that the proposed method achieves remarkable performance compared to refined baseline techniques with the real transducer response.
Original language | English |
---|---|
Title of host publication | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings |
Editors | Bhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9798350368741 |
DOIs | |
Publication status | Published - 2025 |
Externally published | Yes |
Event | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India Duration: Apr 6 2025 → Apr 11 2025 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
ISSN (Print) | 1520-6149 |
Conference
Conference | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 |
---|---|
Country/Territory | India |
City | Hyderabad |
Period | 4/6/25 → 4/11/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
ASJC Scopus Subject Areas
- Software
- Signal Processing
- Electrical and Electronic Engineering
Keywords
- deep learning
- distortion correction
- Parametric array loudspeaker
- pre-processing network