TY - CHAP
T1 - NLP versus IR approaches to fuzzy name searching in digital libraries
AU - Wu, Paul Horng Jyh
AU - Na, Jin Cheon
AU - Khoo, Christopher S.G.
PY - 2004
Y1 - 2004
N2 - Name Search is an important search function in Digital Library systems and various types of information retrieval systems, such as directory search systems, electronic phonebooks and yellow pages. The paper discusses two main approaches to fuzzy name matching - the natural language processing (NLP) approach and the information retrieval (IR) approach - and proposes a hybrid approach. Person names can be considered a (sub-)language, in which case a name search system will be developed using Natural Language Processing apparatus including dictionary, thesaurus and grammatical schema. On the other hand, if names are perceived as (free) text, then an entirely different system may be built incorporating indexing, retrieving, relevance ranking and other Information Retrieval techniques. These two schools of thought, NLP and IR, have somewhat different sets of techniques originating from different theoretical concerns and research traditions. A selective combination of their complementary features is likely to be more effective for fuzzy name matching. Two principles, position attribute identity (PAI) and position transition likelihood (PTL), are proposed to incorporate aspects of both approaches. The two principles have been implemented in an NLP- and IR-hybrid model system called Friendly Name Search (FNS) for real world applications in multilingual directory searches on the Singapore Yellowpages website.
AB - Name Search is an important search function in Digital Library systems and various types of information retrieval systems, such as directory search systems, electronic phonebooks and yellow pages. The paper discusses two main approaches to fuzzy name matching - the natural language processing (NLP) approach and the information retrieval (IR) approach - and proposes a hybrid approach. Person names can be considered a (sub-)language, in which case a name search system will be developed using Natural Language Processing apparatus including dictionary, thesaurus and grammatical schema. On the other hand, if names are perceived as (free) text, then an entirely different system may be built incorporating indexing, retrieving, relevance ranking and other Information Retrieval techniques. These two schools of thought, NLP and IR, have somewhat different sets of techniques originating from different theoretical concerns and research traditions. A selective combination of their complementary features is likely to be more effective for fuzzy name matching. Two principles, position attribute identity (PAI) and position transition likelihood (PTL), are proposed to incorporate aspects of both approaches. The two principles have been implemented in an NLP- and IR-hybrid model system called Friendly Name Search (FNS) for real world applications in multilingual directory searches on the Singapore Yellowpages website.
UR - http://www.scopus.com/inward/record.url?scp=35048887198&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35048887198&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-30230-8_14
DO - 10.1007/978-3-540-30230-8_14
M3 - Chapter
AN - SCOPUS:35048887198
SN - 3540230130
SN - 9783540230137
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 145
EP - 156
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A2 - Heery, Rachel
A2 - Lyon, Liz
PB - Springer Verlag
ER -