Chinese text chunking using lexicalized HMMS

Guo Hong Fu*, Rui Feng Xu, Kang Kwong Luke, Qin Lu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system.

Original languageEnglish
Title of host publication2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005
Pages7-12
Number of pages6
Publication statusPublished - 2005
Externally publishedYes
EventInternational Conference on Machine Learning and Cybernetics, ICMLC 2005 - Guangzhou, China
Duration: Aug 18 2005Aug 21 2005

Publication series

Name2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005

Conference

ConferenceInternational Conference on Machine Learning and Cybernetics, ICMLC 2005
Country/TerritoryChina
CityGuangzhou
Period8/18/058/21/05

ASJC Scopus Subject Areas

  • General Engineering

Keywords

  • Base phrase recognition
  • Base phrase structure
  • Lexicalized hidden markov models (HMMs)
  • Text chunking

Fingerprint

Dive into the research topics of 'Chinese text chunking using lexicalized HMMS'. Together they form a unique fingerprint.

Cite this