TY - GEN
T1 - Chinese text chunking using lexicalized HMMS
AU - Fu, Guo Hong
AU - Xu, Rui Feng
AU - Luke, Kang Kwong
AU - Lu, Qin
PY - 2005
Y1 - 2005
N2 - This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system.
AB - This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system.
KW - Base phrase recognition
KW - Base phrase structure
KW - Lexicalized hidden markov models (HMMs)
KW - Text chunking
UR - http://www.scopus.com/inward/record.url?scp=28444444555&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=28444444555&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:28444444555
SN - 078039092X
SN - 9780780390928
T3 - 2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005
SP - 7
EP - 12
BT - 2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005
T2 - International Conference on Machine Learning and Cybernetics, ICMLC 2005
Y2 - 18 August 2005 through 21 August 2005
ER -