Abstract
Code-switching (CS) occurs when languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). We address the problem of language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information within the CS-ASR model by dynamically biasing the model with token-level language posteriors corresponding to outputs of a sequence-to-sequence auxiliary language diarization (LD) module. In contrast, the disentangling process reduces the difference between languages via adversarial training so as to normalize two languages. We conduct experiments on the SEAME dataset. Compared to the baseline model, both the joint optimization with LD and the language posterior bias achieve performance improvement. Comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech.
Original language | English |
---|---|
Title of host publication | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781728163277 |
DOIs | |
Publication status | Published - 2023 |
Externally published | Yes |
Event | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: Jun 4 2023 → Jun 10 2023 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
Volume | 2023-June |
ISSN (Print) | 1520-6149 |
Conference
Conference | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 |
---|---|
Country/Territory | Greece |
City | Rhodes Island |
Period | 6/4/23 → 6/10/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
ASJC Scopus Subject Areas
- Software
- Signal Processing
- Electrical and Electronic Engineering
Keywords
- automatic speech recognition
- code-switching
- language diarization
- language posterior
- token