Abstract
This paper presents a modified class-based LM approach to Chinese unknown word identification. In this work, Chinese unknown word identification is viewed as a classification problem and the part-of-speech of each unknown word is defined as its class. Furthermore, three types of features, including contextual class feature, word juncture model and word formation patterns, are combined in a framework of class-based LM to perform correct unknown word identification on a sequence of known words. In addition to unknown word identification, the class-based LM approach also provides a solution for unknown word tagging. The results of our experiments show that most unknown words in Chinese texts can be resolved effectively by the proposed approach.
Original language | English |
---|---|
Pages (from-to) | 704-713 |
Number of pages | 10 |
Journal | Lecture Notes in Computer Science |
Volume | 3248 |
DOIs | |
Publication status | Published - 2005 |
Externally published | Yes |
Event | First International Joint Conference on Natural Language Processing - IJCNLP 2004 - Hainan Island, China Duration: Mar 22 2004 → Mar 24 2004 |
ASJC Scopus Subject Areas
- Theoretical Computer Science
- General Computer Science