A unified framework for text snalysis in Chinese TTS

Guohong Fu*, Min Zhang, Guodong Zhou, Kang Kuong Luke

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper presents a robust text analysis system for Chinese text-to-speech synthesis. In this study, a lexicon word or a continuum of non-hanzi characters with the same category (e.g. a digit string) are defined as a morpheme, which is the basic unit forming a Chinese word. Based on this definition, the three key issues concerning the interpretation of real Chinese text, namely lexical disambiguation, unknown word resolution and non-standard word (NSW) normalization can be unified in a single framework and reformulated as a two-pass tagging task on a sequence of morphemes. Our system consists of four main components: (1) a pre-segmenter for sentence segmentation and morpheme segmentation; and (2) a lexicalized HMM-based chunker for identifying unknown words and guessing their part-of-speech categories; and (3) a HMM-based tagger for converting orthographic morphemes to their Chinese phonetic representation (viz. pinyin), given their word-formation patterns and part-of-speech information; (4) a post-processing for interpreting phonetic tags and fine-tuning pronunciation order for some special NSWs if necessary. The evaluation on a pinyin-notated corpus built from the Peking University corpus shows that our system can achieve correct interpretation for most words.

Original languageEnglish
Title of host publicationChinese Spoken Language Processing - 5th International Symposium, ISCSLP 2006, Proceedings
Pages200-210
Number of pages11
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event5th International Symposium on Chinese Spoken Language Processing, ISCSLP 2006 - Singapore, Singapore
Duration: Dec 13 2006Dec 16 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4274 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th International Symposium on Chinese Spoken Language Processing, ISCSLP 2006
Country/TerritorySingapore
CitySingapore
Period12/13/0612/16/06

ASJC Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

Keywords

  • Chinese TTS
  • Grapheme-to-phoneme conversion
  • Lexical analysis
  • Text analysis
  • Text normalization

Fingerprint

Dive into the research topics of 'A unified framework for text snalysis in Chinese TTS'. Together they form a unique fingerprint.

Cite this