TY - GEN
T1 - Chinese unknown word identification as known word tagging
AU - Fu, Guo Hong
AU - Luke, Kang Kwong
PY - 2004
Y1 - 2004
N2 - This paper presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.
AB - This paper presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.
KW - Chinese word segmentation
KW - Known word tagging
KW - Lexicalized HMMs
KW - Unknown word identification
UR - http://www.scopus.com/inward/record.url?scp=6344285863&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=6344285863&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:6344285863
SN - 0780384032
T3 - Proceedings of 2004 International Conference on Machine Learning and Cybernetics
SP - 2612
EP - 2617
BT - Proceedings of 2004 International Conference on Machine Learning and Cybernetics
T2 - Proceedings of 2004 International Conference on Machine Learning and Cybernetics
Y2 - 26 August 2004 through 29 August 2004
ER -