Chinese text chunking using lexicalized HMMS

Guo Hong Fu; Rui Feng Xu; Kang Kwong Luke; Qin Lu

Chinese text chunking using lexicalized HMMS

Guo Hong Fu^*, Rui Feng Xu, Kang Kwong Luke, Qin Lu

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Citations (Scopus)

Abstract

This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system.

Original language	English
Title of host publication	2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005
Pages	7-12
Number of pages	6
Publication status	Published - 2005
Externally published	Yes
Event	International Conference on Machine Learning and Cybernetics, ICMLC 2005 - Guangzhou, China Duration: Aug 18 2005 → Aug 21 2005

Publication series

Name	2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005

Conference

Conference	International Conference on Machine Learning and Cybernetics, ICMLC 2005
Country/Territory	China
City	Guangzhou
Period	8/18/05 → 8/21/05

ASJC Scopus Subject Areas

General Engineering

Keywords

Base phrase recognition
Base phrase structure
Lexicalized hidden markov models (HMMs)
Text chunking

Cite this

@inproceedings{9f0a76dfb790494dabedcf0f276861db,

title = "Chinese text chunking using lexicalized HMMS",

abstract = "This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system.",

keywords = "Base phrase recognition, Base phrase structure, Lexicalized hidden markov models (HMMs), Text chunking",

author = "Fu, {Guo Hong} and Xu, {Rui Feng} and Luke, {Kang Kwong} and Qin Lu",

year = "2005",

language = "English",

isbn = "078039092X",

series = "2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005",

pages = "7--12",

booktitle = "2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005",

note = "International Conference on Machine Learning and Cybernetics, ICMLC 2005 ; Conference date: 18-08-2005 Through 21-08-2005",

}

TY - GEN

T1 - Chinese text chunking using lexicalized HMMS

AU - Fu, Guo Hong

AU - Xu, Rui Feng

AU - Luke, Kang Kwong

AU - Lu, Qin

PY - 2005

Y1 - 2005

N2 - This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system.

AB - This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system.

KW - Base phrase recognition

KW - Base phrase structure

KW - Lexicalized hidden markov models (HMMs)

KW - Text chunking

UR - http://www.scopus.com/inward/record.url?scp=28444444555&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=28444444555&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:28444444555

SN - 078039092X

SN - 9780780390928

T3 - 2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005

SP - 7

EP - 12

BT - 2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005

T2 - International Conference on Machine Learning and Cybernetics, ICMLC 2005

Y2 - 18 August 2005 through 21 August 2005

ER -

Chinese text chunking using lexicalized HMMS

Abstract

Publication series

Conference

ASJC Scopus Subject Areas

Keywords

Other files and links

Fingerprint

Cite this