NLP versus IR approaches to fuzzy name searching in digital libraries

Paul Horng Jyh Wu*, Jin Cheon Na, Christopher S.G. Khoo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapter

4 Citations (Scopus)

Abstract

Name Search is an important search function in Digital Library systems and various types of information retrieval systems, such as directory search systems, electronic phonebooks and yellow pages. The paper discusses two main approaches to fuzzy name matching - the natural language processing (NLP) approach and the information retrieval (IR) approach - and proposes a hybrid approach. Person names can be considered a (sub-)language, in which case a name search system will be developed using Natural Language Processing apparatus including dictionary, thesaurus and grammatical schema. On the other hand, if names are perceived as (free) text, then an entirely different system may be built incorporating indexing, retrieving, relevance ranking and other Information Retrieval techniques. These two schools of thought, NLP and IR, have somewhat different sets of techniques originating from different theoretical concerns and research traditions. A selective combination of their complementary features is likely to be more effective for fuzzy name matching. Two principles, position attribute identity (PAI) and position transition likelihood (PTL), are proposed to incorporate aspects of both approaches. The two principles have been implemented in an NLP- and IR-hybrid model system called Friendly Name Search (FNS) for real world applications in multilingual directory searches on the Singapore Yellowpages website.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsRachel Heery, Liz Lyon
PublisherSpringer Verlag
Pages145-156
Number of pages12
ISBN (Print)3540230130, 9783540230137
DOIs
Publication statusPublished - 2004
Externally publishedYes

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3232
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

ASJC Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'NLP versus IR approaches to fuzzy name searching in digital libraries'. Together they form a unique fingerprint.

Cite this