Chinese unknown word identification as known word tagging

Guo Hong Fu*, Kang Kwong Luke

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

This paper presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.

Original languageEnglish
Title of host publicationProceedings of 2004 International Conference on Machine Learning and Cybernetics
Pages2612-2617
Number of pages6
Publication statusPublished - 2004
Externally publishedYes
EventProceedings of 2004 International Conference on Machine Learning and Cybernetics - Shanghai, China
Duration: Aug 26 2004Aug 29 2004

Publication series

NameProceedings of 2004 International Conference on Machine Learning and Cybernetics
Volume4

Conference

ConferenceProceedings of 2004 International Conference on Machine Learning and Cybernetics
Country/TerritoryChina
CityShanghai
Period8/26/048/29/04

ASJC Scopus Subject Areas

  • General Engineering

Keywords

  • Chinese word segmentation
  • Known word tagging
  • Lexicalized HMMs
  • Unknown word identification

Fingerprint

Dive into the research topics of 'Chinese unknown word identification as known word tagging'. Together they form a unique fingerprint.

Cite this