MERLIon CCS Challenge Development and Evaluation Datasets Open Preview (Documentation)

  • Victoria Yi Han Chua (Creator)
  • Suzy Styles (Creator)
  • Suzy J. Styles (Contributor)

Dataset

Description

The inaugural Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge focuses on developing robust language identification and language diarization systems that are reliable for non-standard, accented, spontaneous code-switched, child-directed speech collected via Zoom. The inaugural MERLIon CCS Challenge is a special session at INTERSPEECH 2023. This repository is a open preview containing documentation about the files that can be downloaded in the development and evaluation sets for two Tasks in the 2023 MERLIon CCS Challenge. In work arising from this corpus, please cite the dataset: Chua, Victoria Yi Han; Garcia Perera, Leibny Paola; Khudanpur, Sanjeev; Khong, Andy W. H.; Dauwels, Justin; Woon, Fei Ting; Styles, Suzy J, 2023, "Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge", https://doi.org/10.21979/N9/ANXS8Z, DR-NTU (Data), V1
Date made available2023
PublisherDR-NTU (Data)

Cite this