TY - GEN
T1 - Filtering product reviews from web search results
AU - Thet, Tun Thura
AU - Na, Jin Cheon
AU - Khoo, Christopher S.G.
PY - 2007
Y1 - 2007
N2 - This study seeks to develop an automatic method to identify product reviews on the Web using the snippets (summary information) returned by search engines. Determining whether a snippet is a review or non-review is a challenging task, since the snippet usually does not contain many useful features for identifying review documents. Firstly we applied a common machine learning technique, SVM (Support Vector Machine), to investigate which features of snippets are useful for the classification. Then we employed a heuristic approach utilizing domain knowledge and found that the heuristic approach performs equally well as the machine learning approach. A hybrid approach which combines the machine learning technique and domain knowledge performs slightly better than the machine learning approach alone.
AB - This study seeks to develop an automatic method to identify product reviews on the Web using the snippets (summary information) returned by search engines. Determining whether a snippet is a review or non-review is a challenging task, since the snippet usually does not contain many useful features for identifying review documents. Firstly we applied a common machine learning technique, SVM (Support Vector Machine), to investigate which features of snippets are useful for the classification. Then we employed a heuristic approach utilizing domain knowledge and found that the heuristic approach performs equally well as the machine learning approach. A hybrid approach which combines the machine learning technique and domain knowledge performs slightly better than the machine learning approach alone.
KW - Genre classification
KW - Product review documents
KW - Snippets
UR - http://www.scopus.com/inward/record.url?scp=37849035663&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=37849035663&partnerID=8YFLogxK
U2 - 10.1145/1284420.1284467
DO - 10.1145/1284420.1284467
M3 - Conference contribution
AN - SCOPUS:37849035663
SN - 9781595937766
T3 - DocEng'07: Proceedings of the 2007 ACM Symposium on Document Engineering
SP - 196
EP - 198
BT - DocEng'07
T2 - DocEng'07: 2007 ACM Symposium on Document Engineering
Y2 - 28 August 2007 through 31 August 2007
ER -