Automatic expansion of abbreviations in Chinese news text

Guohong Fu*, Kang Kwong Luke, Guo Dong Zhou, Ruifeng Xu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

This paper presents an n-gram based approach to Chinese abbreviation expansion. In this study, we distinguish reduced abbreviations from non-reduced abbreviations that are created by elimination or generalization. For a reduced abbreviation, a mapping table is compiled to map each short-word in it to a set of long-words, and a bigram based Viterbi algorithm is thus applied to decode an appropriate combination of long-words as its full-form. For a non-reduced abbreviation, a dictionary of non-reduced abbreviation/full-form pairs is used to generate its expansion candidates, and a disambiguation technique is further employed to select a proper expansion based on bigram word segmentation. The evaluation on an abbreviation-expanded corpus built from the PKU corpus showed that the proposed system achieved a recall of 82.9% and a precision of 85.5% on average for different types of abbreviations in Chinese news text.

Original languageEnglish
Title of host publicationInformation Retrieval Technology - Third Asia Information Retrieval Symposium, AIRS 2006, Proceedings
PublisherSpringer Verlag
Pages530-536
Number of pages7
ISBN (Print)3540457801, 9783540457800
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event3rd Asia Information Retrieval Symposium, AIRS 2006 - Singapore, Singapore
Duration: Oct 16 2006Oct 18 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4182 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd Asia Information Retrieval Symposium, AIRS 2006
Country/TerritorySingapore
CitySingapore
Period10/16/0610/18/06

ASJC Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Automatic expansion of abbreviations in Chinese news text'. Together they form a unique fingerprint.

Cite this