Classification of Tourist Comment Using Word2vec and Random Forest Algorithm

  • Isra Nurul HABIBI Computer Science Department BINUS Graduate Program-Master of Computer Science Bina Nusantara University, Indonesia
  • Abba Suganda GIRSANG Computer Science Department BINUS Graduate Program-Master of Computer Science Bina Nusantara University, Indonesia

Abstract

Text classification is one of the ways to classify sentences. The grouped data are comments from social media with training data from sites that provide points /scores for each review given such as tripadvisor.co.id. The word2vec method is used to extract words into numbers so that the machine learning algorithm can be applied to classify data. Word2vec is an unsupervised task that is capable of utilizing unlabeled data to convert a word into its vector representation that can also find the semantic relationship between words by counting their distance. The goal from this paper is that data from social media such as Twitter or Instagram can also quickly find out the total /weight of a tourist place from the comment given. The experiment shows that the result of F1 Score on data without removing stop words and eliminate the train data, give a better result 0,85.

References

[1] Bhardwaj, P., Gautam, S. and Pahwa, P. 2018. A novel approach to analyze the sentiments of tweets related to TripAdvisor, Journal of Information and Optimization Sciences 39: 591-605. DOI: https://doi.org/10.1080/02522667.2017.1417726
[2] Bird, S., Klein E. and Loper, E. 2009. Natural Language Processing with Python, O’Reilly Media Inc.
[3] Khoo, F.S., The, P.L. and Ooi, P.B. 2017. Consistency Of Online Consumers’ Perceptions Of Posted Comments: An Analysis Of Tripadvisor Reviews, Journal of ICT: 374–393.
[4] Kim, Y.A. and Srivastava, J. 2007. Impact of social influence in e-commerce decision making, Proceedings of the ninth international conference on Electronic commerce: 293-302. DOI: https://doi.org/10.1145/1282100.1282157
[5] Krawczyk, B. and Cano, A. 2017. Sentiment Classification from Multi-class Imbalanced Twitter Data Using Binarization, International Conference on Hybrid Artificial Intelligence Systems: 26-37. DOI: https://doi.org/10.1007/978-3-319-59650-1_3
[6] Linda and S.-l. LAI. 2010. Social Commerce – E-Commerce in Social Media Context, World Academy of Science Engineering and Technology: 39-44. DOI: https://doi.org/10.5281/zenodo.1056500
[7] Lai, S., Xu, L., Liu, K. and Zhao, J. 2015. Recurrent Convolutional Neural Networks for Text Classification, in Twenty-Ninth AAAI Conference on Artificial Intelligence.
[8] Li, Q., Shah, S., Liu, X., Nourbakhsh, A. and Fang, R. 2016. Tweet Topic Classification Using Distributed Language Representations, Web Intelligence (WI), 2016 IEEE/WIC/ACM International Conference on: 81-88, DOI: https://doi.org/10.1109/WI.2016.0022
[9] Lilleberg, J., Zhu, Y. and Zhang, Y. 2015. Support vector machines and word2vec for text classification with semantic features, Cognitive Informatics \& Cognitive Computing (ICCI* CC), 2015 IEEE 14th International Conference on: 136-140, DOI: https://doi.org/10.1109/ICCI-CC.2015.7259377
[10] Marsden, P. and Chaney, P. 2012. The Social Commerce Handbook: 20 Secrets for Turning Social Media Into Social Sales, New York: McGraw-Hill.
[11] Martin-Fuentes, E., Mateu, C. and Fernandez, C. 2018. The more the merrier? Number of reviews versus score on TripAdvisor and Booking.com, International Journal of Hospitality & Tourism Administration. DOI: https://doi.org/10.1080/15256480.2018.1429337
[12] Mikolov, T., Chen, K., Corrado, G. and Dean, J. 2013. Efficient Estimation of Word Representations in Vector Space, in Proc. Workshop at ICLR.
[13] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. and Jeffrey. 2013. Distributed Representations of Words and Phrases and their Compositionality, in Proceedings of NIPS.
[14] Qiang, Y., Rob, L., Bin, G. and Wei, C. 2010. The influence of user-generated content on traveler behavior: An empirical investigation on the effects of e-word-of-mouth to hotel online bookings, Computers in Human behavior 27: 634-639. DOI: https://doi.org/10.1016/j.chb.2010.04.014
[15] Rahmawati, D. and Khodra, M.L. 2016 .Word2vec Semantic Representation in Multilabel Classification for Indonesian News Article, Advanced Informatics: Concepts, Theory And Application (ICAICTA) International Conference On: 1-6,. DOI: https://doi.org/10.1109/ICAICTA.2016.7803115.
[16] Rehurek, R. and Sojka, P. 2010. Software Framework for Topic Modelling with Large Corpora, in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Malta.
[17] Sigala, M. 2017. Collaborative commerce in tourism: implications for research and industry, Current Issues in Tourism 20: 346-355. DOI: https://doi.org/10.1080/13683500.2014.982522
[18] Sinha, R. and Swearingen, K. 2001. Comparing Recommendations Made by Online Systems and Friends, in Proceedings of the DELOS-NSF Workshop on Personalization and Recommender Systems in Digital Libraries, Dublin, Ireland.
[19] TripAdvisor. 2017. Tentang TripAdvisor, [Online]. Available: https://tripadvisor.mediaroom.com/id-about-us . [Accessed 15 November 2018].
[20] Valdivia, A., Luzón, M.V. and Herrera, F. 2017. Sentiment Analysis on TripAdvisor: Are There Inconsistencies in User Reviews?, in International Conference on Hybrid Artificial Intelligence Systems. DOI: https://doi.org/10.1007/978-3-319-59650-1_2
[21] Zhang, D., Xua, H., Su, Z. and Xu, Y. 2015. Chinese comments sentiment classification based on word2vec and SVMperf, Expert Systems with Applications 42: 1857-1863. DOI: https://doi.org/10.1007/978-3-319-59650-1_2
[22] Zhao, H. and Morad, B. 2013. From e-commerce to social commerce: A close look at design features, Electronic Commerce Research and Applications 12: 246-259. DOI: https://doi.org/10.1016/j.elerap.2012.12.003
Published
2019-04-15
How to Cite
HABIBI, Isra Nurul; GIRSANG, Abba Suganda. Classification of Tourist Comment Using Word2vec and Random Forest Algorithm. Journal of Environmental Management and Tourism, [S.l.], v. 9, n. 8, p. 1725-1732, apr. 2019. ISSN 2068-7729. Available at: <https://journals.aserspublishing.eu/jemt/article/view/3117>. Date accessed: 24 apr. 2024. doi: https://doi.org/10.14505//jemt.v9.8(32).11.