A Sentiment Analyzer for Informal Text in Social Media

Huong Thanh Le1, , Nhan Trong Tran1
1 Hanoi University of Science and Technology - No. 1, Dai Co Viet, Hai Ba Trung, Hanoi, Viet Nam

Main Article Content

Abstract

This paper introduces an approach to Twitter sentiment analysis, with the task of classifying tweets as positive, negative or neutral. In the preprocessing task, we propose a method to deal with two problems: (i) repeated characters in informal expression of words; and (ii) the affect of contrast word in determining sentence polarity. We propose features used in this task, investigate and select an optimal classifying algorithm among Decision Tree, K Nearest Neighbor, Support Vector Machine, and a Voting Classifier for solving Twitter sentiment analysis problem. Experiment results with Twitter 2016 test dataset shown that our system achieved good results (63.7% F1-score) compared to related research in this field.

Article Details

References

[1] Corinna Cortes, Vladimir Vapnik. 1995. Support-Vector Networks, Machine Learning. 20, pp.273-297.
[2] Ajay Deshwal,Sudhir Kumar Sharma. 2016. Twitter sentiment analysis using various classification algorithms. In Proceeding of CRITO 2016.
[3] Svetlana Kiritchenko, Xiaodan Zhu Xiaodan, Saif M. Mohammad. 2014. Sentiment Analysis of Short Informal Texts. Journal of Artificial Intelligence Research 50 (2014) 723-762
[4] Mickael Rouvier, Benoit Favre: SENSEI-LIF at SemEval-2016 Task 4: Polarity embedding fusion for robust sentiment analysis. In Proceeding of NAACL-HLT 2016, 202-208
[5] Steven Xu, Huizhi Liang, Timothy Baldwin: UNIMELB at SemEval-2016 Tasks 4A and 4B: An Ensemble of Neural Networks and a Word2Vec Based Model for Sentiment Classification. In Proceeding of NAACL-HLT 2016, 183-189
[6] Stavros Giorgis, Apostolos Rousas, John Pavlopoulos, Prodromos Malakasiotis, and Ion Androutsopoulos. 2016. Aueb.twitter.sentiment at SemEval-2016 Task 4: A Weighted Ensemble of SVMs for Twitter Sentiment Analysis. In Proceeding of NAACL-HLT 2016.
[7] Hussam Hamdan. 2016. SentiSys at SemEval-2016 Task 4: Feature-Based System for Sentiment Analysis in Twitter. In Proceeding of NAACL-HLT 2016, 190-197.
[8] Mateusz Lango, Dariusz Brzezinski, Jerzy Stefanowski. PUT at SemEval-2016 Task 4: The ABC of Twitter Sentiment Analysis. 126-132. NAACL-HLT 2016.
[9] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov. 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606
[10] Go, A., Bhayani, R., & Huang, L. 2009. Twitter sentiment classification using distant supervision. Tech. rep., Stanford University.
[11] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation.
[12] Jan Deriu, Maurice Gonzenbach, Fatih Uzdilli, Aurélien Lucchi, Valeria De Luca, Martin Jaggi: SwissCheese at SemEval-2016 Task 4: Sentiment Classification Using an Ensemble of Convolutional Neural Networks with Distant Supervision. In Proceeding of NAACL-HLT 2016, 1124-1128