Chinese unknown word identification using class-based LM

Guohong Fu*, Kang Kwong Luke

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

13 Citations (Scopus)

Abstract

This paper presents a modified class-based LM approach to Chinese unknown word identification. In this work, Chinese unknown word identification is viewed as a classification problem and the part-of-speech of each unknown word is defined as its class. Furthermore, three types of features, including contextual class feature, word juncture model and word formation patterns, are combined in a framework of class-based LM to perform correct unknown word identification on a sequence of known words. In addition to unknown word identification, the class-based LM approach also provides a solution for unknown word tagging. The results of our experiments show that most unknown words in Chinese texts can be resolved effectively by the proposed approach.

Original languageEnglish
Pages (from-to)704-713
Number of pages10
JournalLecture Notes in Computer Science
Volume3248
DOIs
Publication statusPublished - 2005
Externally publishedYes
EventFirst International Joint Conference on Natural Language Processing - IJCNLP 2004 - Hainan Island, China
Duration: Mar 22 2004Mar 24 2004

ASJC Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Chinese unknown word identification using class-based LM'. Together they form a unique fingerprint.

Cite this