Abstract
Speech enhancement for drone audition applications is challenging due to the low SNR with large spectra feature overlap and limited computing resources. We propose SMoLnet-T, a complex spectral mapping approach with frame-wise CNN and newly-formulated spectral combination transformers. SMoLnet-T incorporates dilated CNN to extract spectral maps of high-frequency resolution for its transformers. This allows it to focus on a higher level of abstraction and determine the combination of spectral maps is crucial for enhancement across a large temporal context. Experiment results with noise recorded from a hovering drone highlight the efficacy of SMoLnet-T over DPTNet with significantly lower computational requirements and speech distortion while achieving improved speech intelligibility under SNR < −23 dB.
Original language | English |
---|---|
Title of host publication | APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9798350367331 |
DOIs | |
Publication status | Published - 2024 |
Externally published | Yes |
Event | 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 - Macau, China Duration: Dec 3 2024 → Dec 6 2024 |
Publication series
Name | APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024 |
---|
Conference
Conference | 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 |
---|---|
Country/Territory | China |
City | Macau |
Period | 12/3/24 → 12/6/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
ASJC Scopus Subject Areas
- Artificial Intelligence
- Computer Science Applications
- Hardware and Architecture
- Signal Processing
Keywords
- Convolution neural network
- deep learning
- drone audition
- speech enhancement
- transformer