Abstract
The objective of this paper is to distinguish between authentic and fictitious user-generated hotel reviews. To achieve this objective, it adopts a two-step approach. The first seeks to classify authentic and fictitious reviews by leveraging on their possible textual differences. The second step attempts to identify the textual traits that are unique to authentic and fictitious reviews. For the purpose of this paper, a ground truth dataset of 1,800 reviews, uniformly divided between authentic and fictitious, was created. With respect to the first step, authentic and fictitious reviews were classified by using four forms of textual differences: understandability, level of details, writing style, and cognition indicators. Classification was performed using voting by average probability among logistic regression, C4.5, Support Vector Machine, JRip, and Random Forest classifiers. Using five-fold cross-validation, the proposed approach was found to outperform two existing baselines. Furthermore, with respect to the second step, the textual traits unique to authentic and fictitious reviews were identified using Information Gain, and Chi-squared feature selection techniques. A sequential forward feature selection approach was further adopted to identify the top five features that aid the classification of authentic and fictitious reviews. These include the use of nouns, articles, function words, punctuations, and in particular, exclamation points in reviews. The implications of the results are discussed.
Original language | English |
---|---|
Title of host publication | 6th International Conference on Computing, Communications and Networking Technologies, ICCCNT 2015 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781479979844 |
DOIs | |
Publication status | Published - Jan 29 2016 |
Externally published | Yes |
Event | 6th International Conference on Computing, Communications and Networking Technologies, ICCCNT 2015 - Denton, United States Duration: Jul 13 2015 → Jul 15 2015 |
Publication series
Name | 6th International Conference on Computing, Communications and Networking Technologies, ICCCNT 2015 |
---|
Conference
Conference | 6th International Conference on Computing, Communications and Networking Technologies, ICCCNT 2015 |
---|---|
Country/Territory | United States |
City | Denton |
Period | 7/13/15 → 7/15/15 |
Bibliographical note
Publisher Copyright:© 2015 IEEE.
ASJC Scopus Subject Areas
- Computer Networks and Communications
- Hardware and Architecture
Keywords
- classification algorithms
- data mining
- machine learning
- text analysis