Automatic classification of web search results: Product review vs. non-review documents

Tun Thura Thet*, Jin Cheon Na, Christopher S.G. Khoo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This study seeks to develop an automatic method to identify product review documents on the Web using the snippets (summary information that includes the URL, title, and summary text) returned by the Web search engine. The aim is to allow the user to extend topical search with genre-based filtering or categorization. Firstly we applied a common machine learning technique, SVM (Support Vector Machine), to investigate which features of the snippets are useful for classification. The best results were obtained using just the title and URL (domain and folder names) of the snippets as phrase terms (n-grams). Then we developed a heuristic approach that utilizes domain knowledge constructed semi-automatically, and found that it performs comparatively well, with only a small drop in accuracy rates. A hybrid approach which combines both the machine learning and heuristic approaches performs slightly better than the machine learning approach alone.

Original languageEnglish
Title of host publicationAsian Digital Libraries
Subtitle of host publicationLooking Back 10 Years and Forging New Frontiers - 10th International Conference on Asian Digital Libraries, ICADL 2007, Proceedings
PublisherSpringer Verlag
Pages65-74
Number of pages10
ISBN (Print)9783540770930
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event10th International Conference on Asian Digital Libraries, ICADL 2007 - Hanoi, Viet Nam
Duration: Dec 10 2007Dec 13 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4822 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on Asian Digital Libraries, ICADL 2007
Country/TerritoryViet Nam
CityHanoi
Period12/10/0712/13/07

ASJC Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

Keywords

  • Genre classification
  • Product review documents
  • Snippets
  • Web search results

Fingerprint

Dive into the research topics of 'Automatic classification of web search results: Product review vs. non-review documents'. Together they form a unique fingerprint.

Cite this