DeepPreNet: A Deep Learning Pre-Processing Method for Speech Distortion Correction in Parametric Array Loudspeaker

Wenyao Ma; Yunxi Zhu; Fengyuan Hao; Liwen Qin; Fengyi Fan; Jun Yang

doi:10.1109/ICASSP49660.2025.10887903

DeepPreNet: A Deep Learning Pre-Processing Method for Speech Distortion Correction in Parametric Array Loudspeaker

Wenyao Ma^*, Yunxi Zhu^*, Fengyuan Hao^*, Liwen Qin^*, Fengyi Fan^*, Jun Yang^*

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

The parametric array loudspeaker produces highly directional sound via a nonlinear process in air, which also introduces inherent baseband distortions. However, conventional recursive modulators employed to compensate for nonlinearity demand substantially increased bandwidth and are not optimized for speech applications. In this paper, we propose a deep learning method tailored for speech, called DeepPreNet. It contains two parts: a pre-processing network (PreNet) and a forward inference model (ForwModel). The ForwModel is a pre-trained network using real recorded speeches to model the actual nonlinear process, enhancing its reliability for PreNet training. The PreNet is trained to generate pre-processed signals, which are subsequently fed into the ForwModel to recover the distortion-free speech. By leveraging the harmonic-rich feature of speech, the proposed method incorporates distortions to reconstruct clean speech, thereby alleviating the bandwidth constraints imposed by the transducer. Experiments in both near- and far-field conditions demonstrate that the proposed method achieves remarkable performance compared to refined baseline techniques with the real transducer response.

Original language	English
Title of host publication	2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
Editors	Bhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9798350368741
DOIs	https://doi.org/10.1109/ICASSP49660.2025.10887903
Publication status	Published - 2025
Externally published	Yes
Event	2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India Duration: Apr 6 2025 → Apr 11 2025

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)	1520-6149

Conference

Conference	2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/Territory	India
City	Hyderabad
Period	4/6/25 → 4/11/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

ASJC Scopus Subject Areas

Software
Signal Processing
Electrical and Electronic Engineering

Keywords

deep learning
distortion correction
Parametric array loudspeaker
pre-processing network

Access to Document

10.1109/ICASSP49660.2025.10887903

Cite this

Ma, W., Zhu, Y., Hao, F., Qin, L., Fan, F., & Yang, J. (2025). DeepPreNet: A Deep Learning Pre-Processing Method for Speech Distortion Correction in Parametric Array Loudspeaker. In B. D. Rao, I. Trancoso, G. Sharma, & N. B. Mehta (Eds.), 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP49660.2025.10887903

Ma, Wenyao ; Zhu, Yunxi ; Hao, Fengyuan et al. / DeepPreNet : A Deep Learning Pre-Processing Method for Speech Distortion Correction in Parametric Array Loudspeaker. 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings. editor / Bhaskar D Rao ; Isabel Trancoso ; Gaurav Sharma ; Neelesh B. Mehta. Institute of Electrical and Electronics Engineers Inc., 2025. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{8b81f3b201db49d6bffc3fe1000e11d1,

title = "DeepPreNet: A Deep Learning Pre-Processing Method for Speech Distortion Correction in Parametric Array Loudspeaker",

abstract = "The parametric array loudspeaker produces highly directional sound via a nonlinear process in air, which also introduces inherent baseband distortions. However, conventional recursive modulators employed to compensate for nonlinearity demand substantially increased bandwidth and are not optimized for speech applications. In this paper, we propose a deep learning method tailored for speech, called DeepPreNet. It contains two parts: a pre-processing network (PreNet) and a forward inference model (ForwModel). The ForwModel is a pre-trained network using real recorded speeches to model the actual nonlinear process, enhancing its reliability for PreNet training. The PreNet is trained to generate pre-processed signals, which are subsequently fed into the ForwModel to recover the distortion-free speech. By leveraging the harmonic-rich feature of speech, the proposed method incorporates distortions to reconstruct clean speech, thereby alleviating the bandwidth constraints imposed by the transducer. Experiments in both near- and far-field conditions demonstrate that the proposed method achieves remarkable performance compared to refined baseline techniques with the real transducer response.",

keywords = "deep learning, distortion correction, Parametric array loudspeaker, pre-processing network",

author = "Wenyao Ma and Yunxi Zhu and Fengyuan Hao and Liwen Qin and Fengyi Fan and Jun Yang",

note = "Publisher Copyright: {\textcopyright} 2025 IEEE.; 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 ; Conference date: 06-04-2025 Through 11-04-2025",

year = "2025",

doi = "10.1109/ICASSP49660.2025.10887903",

language = "English",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

editor = "Rao, {Bhaskar D} and Isabel Trancoso and Gaurav Sharma and Mehta, {Neelesh B.}",

booktitle = "2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings",

address = "United States",

}

Ma, W, Zhu, Y, Hao, F, Qin, L, Fan, F & Yang, J 2025, DeepPreNet: A Deep Learning Pre-Processing Method for Speech Distortion Correction in Parametric Array Loudspeaker. in BD Rao, I Trancoso, G Sharma & NB Mehta (eds), 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025, Hyderabad, India, 4/6/25. https://doi.org/10.1109/ICASSP49660.2025.10887903

DeepPreNet: A Deep Learning Pre-Processing Method for Speech Distortion Correction in Parametric Array Loudspeaker. / Ma, Wenyao; Zhu, Yunxi; Hao, Fengyuan et al.
2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings. ed. / Bhaskar D Rao; Isabel Trancoso; Gaurav Sharma; Neelesh B. Mehta. Institute of Electrical and Electronics Engineers Inc., 2025. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - DeepPreNet

T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

AU - Ma, Wenyao

AU - Zhu, Yunxi

AU - Hao, Fengyuan

AU - Qin, Liwen

AU - Fan, Fengyi

AU - Yang, Jun

PY - 2025

Y1 - 2025

N2 - The parametric array loudspeaker produces highly directional sound via a nonlinear process in air, which also introduces inherent baseband distortions. However, conventional recursive modulators employed to compensate for nonlinearity demand substantially increased bandwidth and are not optimized for speech applications. In this paper, we propose a deep learning method tailored for speech, called DeepPreNet. It contains two parts: a pre-processing network (PreNet) and a forward inference model (ForwModel). The ForwModel is a pre-trained network using real recorded speeches to model the actual nonlinear process, enhancing its reliability for PreNet training. The PreNet is trained to generate pre-processed signals, which are subsequently fed into the ForwModel to recover the distortion-free speech. By leveraging the harmonic-rich feature of speech, the proposed method incorporates distortions to reconstruct clean speech, thereby alleviating the bandwidth constraints imposed by the transducer. Experiments in both near- and far-field conditions demonstrate that the proposed method achieves remarkable performance compared to refined baseline techniques with the real transducer response.

AB - The parametric array loudspeaker produces highly directional sound via a nonlinear process in air, which also introduces inherent baseband distortions. However, conventional recursive modulators employed to compensate for nonlinearity demand substantially increased bandwidth and are not optimized for speech applications. In this paper, we propose a deep learning method tailored for speech, called DeepPreNet. It contains two parts: a pre-processing network (PreNet) and a forward inference model (ForwModel). The ForwModel is a pre-trained network using real recorded speeches to model the actual nonlinear process, enhancing its reliability for PreNet training. The PreNet is trained to generate pre-processed signals, which are subsequently fed into the ForwModel to recover the distortion-free speech. By leveraging the harmonic-rich feature of speech, the proposed method incorporates distortions to reconstruct clean speech, thereby alleviating the bandwidth constraints imposed by the transducer. Experiments in both near- and far-field conditions demonstrate that the proposed method achieves remarkable performance compared to refined baseline techniques with the real transducer response.

KW - deep learning

KW - distortion correction

KW - Parametric array loudspeaker

KW - pre-processing network

UR - http://www.scopus.com/inward/record.url?scp=105003879023&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=105003879023&partnerID=8YFLogxK

U2 - 10.1109/ICASSP49660.2025.10887903

DO - 10.1109/ICASSP49660.2025.10887903

M3 - Conference contribution

AN - SCOPUS:105003879023

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

BT - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings

A2 - Rao, Bhaskar D

A2 - Trancoso, Isabel

A2 - Sharma, Gaurav

A2 - Mehta, Neelesh B.

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 6 April 2025 through 11 April 2025

ER -

Ma W, Zhu Y, Hao F, Qin L, Fan F, Yang J. DeepPreNet: A Deep Learning Pre-Processing Method for Speech Distortion Correction in Parametric Array Loudspeaker. In Rao BD, Trancoso I, Sharma G, Mehta NB, editors, 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2025. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP49660.2025.10887903

DeepPreNet: A Deep Learning Pre-Processing Method for Speech Distortion Correction in Parametric Array Loudspeaker

Abstract

Publication series

Conference

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Cite this