Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization

Hexin Liu; Haihua Xu; Leibny Paola Garcia; Andy W.H. Khong; Yi He; Sanjeev Khudanpur

doi:10.1109/ICASSP49357.2023.10095878

Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization

Hexin Liu^*, Haihua Xu, Leibny Paola Garcia, Andy W.H. Khong, Yi He, Sanjeev Khudanpur

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

21 Citations (Scopus)

Abstract

Code-switching (CS) occurs when languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). We address the problem of language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information within the CS-ASR model by dynamically biasing the model with token-level language posteriors corresponding to outputs of a sequence-to-sequence auxiliary language diarization (LD) module. In contrast, the disentangling process reduces the difference between languages via adversarial training so as to normalize two languages. We conduct experiments on the SEAME dataset. Compared to the baseline model, both the joint optimization with LD and the language posterior bias achieve performance improvement. Comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech.

Original language	English
Title of host publication	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781728163277
DOIs	https://doi.org/10.1109/ICASSP49357.2023.10095878
Publication status	Published - 2023
Externally published	Yes
Event	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: Jun 4 2023 → Jun 10 2023

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2023-June
ISSN (Print)	1520-6149

Conference

Conference	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/Territory	Greece
City	Rhodes Island
Period	6/4/23 → 6/10/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

ASJC Scopus Subject Areas

Software
Signal Processing
Electrical and Electronic Engineering

Keywords

automatic speech recognition
code-switching
language diarization
language posterior
token

Access to Document

10.1109/ICASSP49357.2023.10095878

Cite this

Liu, H., Xu, H., Garcia, L. P., Khong, A. W. H., He, Y., & Khudanpur, S. (2023). Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2023-June). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP49357.2023.10095878

Liu, Hexin ; Xu, Haihua ; Garcia, Leibny Paola et al. / Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2023. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{df5e9a7ac7fb4d808b0d496ad5cbb8c9,

title = "Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization",

abstract = "Code-switching (CS) occurs when languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). We address the problem of language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information within the CS-ASR model by dynamically biasing the model with token-level language posteriors corresponding to outputs of a sequence-to-sequence auxiliary language diarization (LD) module. In contrast, the disentangling process reduces the difference between languages via adversarial training so as to normalize two languages. We conduct experiments on the SEAME dataset. Compared to the baseline model, both the joint optimization with LD and the language posterior bias achieve performance improvement. Comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech.",

keywords = "automatic speech recognition, code-switching, language diarization, language posterior, token",

author = "Hexin Liu and Haihua Xu and Garcia, \{Leibny Paola\} and Khong, \{Andy W.H.\} and Yi He and Sanjeev Khudanpur",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 ; Conference date: 04-06-2023 Through 10-06-2023",

year = "2023",

doi = "10.1109/ICASSP49357.2023.10095878",

language = "English",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings",

address = "United States",

}

Liu, H, Xu, H, Garcia, LP, Khong, AWH, He, Y & Khudanpur, S 2023, Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization. in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2023-June, Institute of Electrical and Electronics Engineers Inc., 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023, Rhodes Island, Greece, 6/4/23. https://doi.org/10.1109/ICASSP49357.2023.10095878

Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization. / Liu, Hexin; Xu, Haihua; Garcia, Leibny Paola et al.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2023. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2023-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization

AU - Liu, Hexin

AU - Xu, Haihua

AU - Garcia, Leibny Paola

AU - Khong, Andy W.H.

AU - He, Yi

AU - Khudanpur, Sanjeev

PY - 2023

Y1 - 2023

N2 - Code-switching (CS) occurs when languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). We address the problem of language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information within the CS-ASR model by dynamically biasing the model with token-level language posteriors corresponding to outputs of a sequence-to-sequence auxiliary language diarization (LD) module. In contrast, the disentangling process reduces the difference between languages via adversarial training so as to normalize two languages. We conduct experiments on the SEAME dataset. Compared to the baseline model, both the joint optimization with LD and the language posterior bias achieve performance improvement. Comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech.

AB - Code-switching (CS) occurs when languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). We address the problem of language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information within the CS-ASR model by dynamically biasing the model with token-level language posteriors corresponding to outputs of a sequence-to-sequence auxiliary language diarization (LD) module. In contrast, the disentangling process reduces the difference between languages via adversarial training so as to normalize two languages. We conduct experiments on the SEAME dataset. Compared to the baseline model, both the joint optimization with LD and the language posterior bias achieve performance improvement. Comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech.

KW - automatic speech recognition

KW - code-switching

KW - language diarization

KW - language posterior

KW - token

UR - http://www.scopus.com/inward/record.url?scp=85174003974&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85174003974&partnerID=8YFLogxK

U2 - 10.1109/ICASSP49357.2023.10095878

DO - 10.1109/ICASSP49357.2023.10095878

M3 - Conference contribution

AN - SCOPUS:85174003974

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

Y2 - 4 June 2023 through 10 June 2023

ER -

Liu H, Xu H, Garcia LP, Khong AWH, He Y, Khudanpur S. Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings. Institute of Electrical and Electronics Engineers Inc. 2023. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP49357.2023.10095878

Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization

Abstract

Publication series

Conference

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Cite this