Abstract— Withthe advancement of web technology and its growth, there is a huge volume ofdata present in the web for internet users and a lot of data is generated too.Social networking sites like Twitter, Facebook, Google+ are rapidly gainingpopularity as they allow people to share and express their views about topics,have discussion with different communities, or post messages across the world.There has been lot of work in the field of sentiment analysis of twitter data.
This survey focuses mainly on sentiment analysis of twitter data which ishelpful to analyze the information in the tweets where opinions are highlyunstructured, heterogeneous and are either positive or negative, or neutral insome cases. In this paper, we provide a survey and a comparative analysis ofexisting techniques for opinion mining like machine learning and lexicon-basedapproaches, together with evaluation metrics. Using various machine learning algorithmslike Naive Bayes, Max Entropy, and Support Vector Machine, we provide researchon twitter data streams.
We have also discussed general challenges andapplications of Sentiment Analysis on Twitter. Keywords—Twitterdata; SVM, Naïve Bayes; and Algorithms;IntroductionGeetika Gautam and Divakar Yadav (2014) 1 proposed the sentiment analysis for customers’review classification which is helpful to analyze the information in the formof the number of tweets where opinions are highly unstructured and are eitherpositive or negative, or somewhere in between of these two. For this we firstpre-processed the dataset, after that extracted the adjective from the datasetthat have some meaning which is called feature vector, then selected thefeature vector list and thereafter applied machine learning basedclassification algorithms namely: Naive Bayes, Maximum entropy and SVM alongwith the Semantic Orientation based WordNet which extracts synonyms andsimilarity for the content feature.
Seyed-Ali Bahrainian and Andreas Dengel (2013) 2 proposedSentiment Analysis (SA) and summarization has recently become the focus of manyresearchers, because analysis of online text is beneficial and demanded in manydifferent applications. One such application is product-based sentimentsummarization of multi-documents with the purpose of informing users about prosand cons of various products. introduces a novel solution to target-oriented(i.e. aspect-based) sentiment summarization and SA of short informal texts witha main focus on Twitter posts known as “tweets”.
We compare differentalgorithms and methods for SA polarity detection and sentiment summarization.Go and L.Huang (2009) 3 proposed a solution forsentiment analysis for twitter data by using distant supervision, in whichtheir training data consisted of tweets with emoticons which served as noisylabels. They build models using Naive Bayes, Maxnet and Support Vector Machines(SVM).
Their feature space consisted of unigrams, bigrams and POS. Theyconcluded that SVM outperformed other models and that unigram were moreeffective as features. Barbosa et al.(2010) 4 designed a two phaseautomatic sentiment analysis method for classifying tweets. They classifiedtweets as objective or subjective and then in second phase, the subjectivetweets were classified as positive or negative. The feature space used includedre-tweets, hash tags, link, punctuation and exclamation marks in conjunctionwith features like prior polarity of words and POS. Bifet and Frank (2010) 5 used Twitter streaming dataprovided by Firehouse API , which gave all messages from every user which arepublicly available in real-time.
They experimented multinomial naive Bayes,stochastic gradient descent, and the Hoeffding tree. They arrived at aconclusion that SGD-based model, when used with an appropriate learning ratewas the better than the rest used. Mitali Desai. (2016)6 Sentiment analysis relates tothe problem of mining the sentiments from online available data andcategorizing the opinion expressed by an author towards a particular entityinto at most three preset categories: positive, negative and neutral. In thispaper, firstly we present the sentiment analysis process to classify highlyunstructured data on Twitter. Secondly, we discuss various techniques tocarryout sentiment analysis on Twitter data in detail. Davidov et al.,(2010) 7 proposed a approach toutilize Twitter user-defined hash tags in tweets as a classification ofsentiment type using punctuation, single words, n-grams and patterns asdifferent feature types, which are then combined into a single feature vectorfor sentiment classification.
They made use of K-Nearest Neighbor strategy toassign sentiment labels by constructing a feature vector for each example inthe training and test set. Po-Wei Liang et.al.(2014) 8 used Twitter API tocollect twitter data. Their training data falls in three different categories(camera, movie, mobile). The data is labeled as positive, negative andnon-opinions.
Tweets containing opinions were filtered. Unigram Naive Bayesmodel was implemented and the Naive Bayes simplifying independence assumptionwas employed. They also eliminated useless features by using the MutualInformation and Chi square feature extraction method. Finally, the orientationof an tweet is predicted. i.e. positive or negative.