Abstract— convey insults or scorn. It is used for

Abstract— Sarcasm is defined as witty language used to convey insults or scorn. It is used for remarks that clearly mean the oppositeof what people want to say, made in order to hurt someone’s feelings or to criticize something in a humorous way. While speaking, it isvery easy to distinguish sarcasm utilizing pitch of voice, gesture, facial expression etc. But in textual data, it is difficult to detect sarcasmdue to lack of described factors. Sentimental analysis is used to know someone’s opinion, attitude towards particular event, companyetc.

Sarcasm is one type of person’s sentiment but used for taunting, insulting, to make fun of someone. Various algorithms areproposed to detect sarcasm based on different features, domains and type of sarcasm. We propose a Hadoop based framework thatcaptures real time tweets, process it and use hybrid algorithm which identifies sarcastic sentiment efficiently. Hybrid approach considerlexical and hyperbole feature to improve performance of system by increasing accuracy, precision, F-score.Keywords— Big data, Hadoop, MapReduce, Sentiment analysis, Sarcasm detectionI. INTRODUCTIONNow a day, most of people are using twitter, facebook and micro blogging sites.

Best services for writing your paper according to Trustpilot

Premium Partner
From $18.00 per page
4,8 / 5
Writers Experience
Recommended Service
From $13.90 per page
4,6 / 5
Writers Experience
From $20.00 per page
4,5 / 5
Writers Experience
* All Partners were chosen among 50+ writing services by our Customer Satisfaction Team

They share their opinion, feeling forparticular topic through comment, review. The volume of data generated daily is very large. So, it is important to analyse the datafor gaining information from that. Sentimental analysis is used for mining various types of data for opinion through text analytics.It can be positive, negative or neutral.Twitter became one of the biggest platform for people to express opinion, share their thoughts and regularly updated aboutany organization, events etc. So, data collected is huge somewhat called bigdata. To process such a big data we need frameworkthat manages this entire thing.

Now a day, people are using sarcasm in their daily life. Sarcasm refers to opposite of what person want to say and it is used tomake fun of others, to annoy someone and to show your anger. So it is important to detect it for more accuracy of the system.

Sentimental analysis is positive, negative or neutral. In positive sentiment also either it is actually positive or sarcastic and fornegative sentiment either it is actually negative or sarcastic. If we ignored sarcasm it impact in sentiment analysis and may bereverse the polarity of sentence.

So it is important to detect it for accurate sentiment analysis of any company or organization.The online Oxford dictionary1defines sarcasm as “the use of irony to make or convey contempt”. Collins dictionary2defines itas “mocking, contemptuous, or ironic language intended to convey scorn or insult”. According to Macmillan English dictionary3,sarcasm is “the activity of saying or writing the opposite of what you mean, or of speaking in a way intended to make someoneelse feel stupid or show them that you are angry”.Now a day, most of researcher is working in this field.

Every day huge amount of data is generated and to deal with this hugedata it takes time to analyze it and to generate information from that. Different algorithm and approach are proposed to detectsarcasm accurately but limited accuracy is achieved. So, this becomes most attracted area for researcher to do research on thistopic and improve accuracy of system.There are many difficulties are present in detecting sarcasm makes it more interesting task.

For example, “Wow, there is hugeamount discount.” This sentence considered as compliment. However, considering following sentence: “Wow, there is hugeamount of discount but I don’t buy anything.” This sentence clarify that person did not mean what he/she said.

For normal peopleit becomes difficult to detect it.There are different features present which is used to detect sarcasm efficiently. Bharti et al1 proposed different types offeature available to detect sarcasm easily. First, Lexical feature is used to detect sarcasm in only text data in which uni-grams, bigramsand n-grams parameters used to detect sarcasm. Bi-grams and n-grams have more impact on sentimental analysis. Second,Hyperbole feature is used to emphasize meaning of text.

In that, Interjection words have more tendencies to become sarcastic. So interjection words play important role to detect sarcasm. Another features under hyperbole are punctuation mark, quotes,intensifier is used to improve performance of system.

For example, “excellent marks” has high impact rather than “good marks”.So, intensifier makes task easy to detect sarcasm. Third, pragmatic feature is used to express emotions more accurately usingsmiles, emoticons, replies. So, we need to identify which type of feature is used so that accordingly algorithm is applied. In ourresearch, we are hybrid two feature that is lexical and hyperbole to improve accuracy of system.

Negation words have impact on sentimental analysis. We have to consider it to detect sarcasm because it reverses the polarityof sentence. Here, we are considered two feature lexical, hyperbole and hybrid them to improve the accuracy of system.

We areconsidered negation feature to improve precision of sarcasm detection. Mapreduce is used to reduce execution time. It is parallelcomputing platform to build reliable, cost-effective, flexible application.There are different types of sarcasm are present: (1) contradiction between positive sentiment and negative situation. Forexample, “I feels great being ignored” (2) Contradiction between negative sentiment and positive situation. For example, “I hatenew Zeeland team because it always win” (3) Tweet starts with an interjection word. For example, “Wow, there is huge amount ofdiscount but I don’t buy anything!!” (4) Likes and Dislikes contradiction (5) Tweet contradicting universal facts (6) Tweetcontains positive sentiment with antonym pair (7) Tweet contradicting time dependent facts.There are many challenges present to detect sarcasm.

Twitter is used as dataset for sarcasm detection. Twitter limits 140characters for posting message that creates more ambiguity. Also, tweets contain uncommon words, slangs, abbreviation more ofinformal nature to make difficult for sarcasm detection. There is no predefined structure available for sarcasm. It becomes easy todetect sarcasm if #sarcasm tag is present either at the end of tweet or middle of tweet. But, it creates difficulty if no #sarcasm tagis available. Joshi et al. 2 highlighted 3 main challenges which are i) the identification of common knowledge, ii) the intent toridicule, and iii) the speaker-listener (or reader in the case of written text) context.

The objectives of our system are listed below:1. To study different approaches available for sarcasm detection.2. To study different features and type of sarcasm available for detection.

3. Proposed modified approach for sarcasm detection efficiently.4. To improve accuracy of sentimental analysis and reduce execution time.II. RELATED WORKThere are many approaches available for sarcasm detection. Different authors consider various feature and approaches toimprove accuracy of system.

There are mainly two approaches available: (1) Machine Learning (2) Rule based approach. Themachine learning approach is a method of analysis that forms a model to predict, arrange or classify data through the statisticalprocess. Meanwhile, rule-based approach is a technique which exploits semantic, syntactic and stylistic properties of sentences inany language such as phrase pattern, lexical and structural attributes to analyse the sentiment of a sentence.Bouazizi and Ohtsuki et al. 3 proposed supervised machine learning approach. They focus on importance of proposed set offeature to detect sarcasm and for each feature they identified different set of parameters to train the data set and tested them.Sentiment, punctuation, syntactic, semantic, pattern based feature are considered to train classifier.

For classification, Randomforest, maximum entropy, SVM, naïve Bayes is used. Rajadesingan et al. 4 aims to address the difficult task of sarcasmdetection on Twitter by leveraging behavioral aspects to users expressing sarcasm. They employ theories from behavioral andpsychological studies to construct a behavioral modeling framework for detecting sarcasm. SCUBA (Sarcasm classification usingbehavioral modeling approach) framework is used. Different forms of sarcasm like Sarcasm as a contrast of sentiments, Sarcasmas a complex form of expression, Sarcasm as a means of conveying emotion, Sarcasm as a possible function of familiarity,Sarcasm as a form of written expression are considered. Tungthamthiti et al.

5 use concept level knowledge to identifycontradiction between sentiment and situation. For example, “I love going to work on holidays” has positive sentiment love but itis actually sarcastic sentence. So, apply concept level knowledge that is holidays have relaxed situation while work has stressfulsituation so contradiction between them present and it considered as sarcastic. Also, focus on coherency that is correlation amongsentences while multiple sentences are present to detect sarcasm.

Bharti et al. 1 proposed algorithm for different types of sarcasm and also considered lexical and interjection feature to detectsarcasm. They captured and processed real time tweets using Apache Flume and Hive under the Hadoop framework, proposed aset of algorithms to detect sarcasm in tweets under the Hadoop framework and proposed another set of algorithms to detectsarcasm in tweets. Riloff et al. 6 proposed bootstrapping algorithm that automatically learns phrases corresponding to positivesentiments and phrases corresponding to negative situations. They use tweets that contain a sarcasm hashtag as positive instancesfor the learning process.

They use the learned lists of sentiment and situation phrases to recognize sarcasm in new tweets byidentifying contexts that contain a positive sentiment in close proximity to a negative situation phrase.Peter et al. 7 apply string matching against positive sentiment and interjection lexicons to test if the presence of both can beused to classify content as being sarcastic. By focusing only on the positive sentiment, which would suggest a negative feeling, those tweets which contained negative sentiment and therefore positive feeling were ignored. Additionally, the use of interjectionsis not unique to sarcastic texts and many tweets may contain them where an author wishes to enhance the expressed sentiment.Vijayalaksmi et al.

8 proposed different semi-supervised algorithm like lexical Analysis with N-grams approach, Knowledgeextraction, contrast approach, emoticon based approach and hyperbole approach to propose a new rule based Hybrid approach forsarcasm detection. But, developing dictionary for these algorithms takes more time. The sarcasm detection was ignored fordifferent languages (except English), repeated tweets and empty or a single letter/word tweets in this study.Different author proposed different approach for detection of sarcasm efficiently. PBLGA is parsing based lexicon generationalgorithm used for generating lexicon that is used to check sarcasm. Contradiction between sentiment and situation has highprobability to classify as sarcastic.

Another IWS (Interjection word start) is used to identify sarcasm in sentence that starts withinterjection words like wow, oh, yeah etc. Table I show comparison of individual algorithm with existing state-of-art algorithmwith various parameters like precision, recall, F-score etc.Table I Comparison of individual algorithm with state-of-art algorithmIII. PROPOSED SYSTEMA. DataIn this study, we are considering twitter data for sarcasm detection.

So, we have to retrieve Tweets through API (input).Twitter provides different API like search API which is used to search tweet using keyword and retrieved it, Streaming API usedto fetch real time live tweets, Rest API is used to retrieve tweets from twitter database. Then after, these tweets are stored inhadoop’s HDFS file system for further processing.B. Preprocessing of DataTweet Preprocessing is required to remove noisy data which is not useful to take decision in sentimental analysis. There is someextra information present like URL which is used to give more information about particular topic or show image for that, @usermentioned in tweet is not necessary for detecting sarcasm so this data is noisy data for sarcasm detection. So, remove this type ofnoisy data to improve performance of system.C.

Part of Speech TaggingP.O.S Tagging(Part of speech tagging) is a process of taking a word from text (corpus) as input and assign corresponding partof-speechto each word as output based on its definition and context ie: relationship with adjacent and related words in a phrase,sentence, or paragraph. After P.O.S. tagging, store all phrases into parse file(PF) and give as an input to our proposed algorithm.

For Example: “I love being ignored”. After P.O.S tagging, I|PRP love|VBP being|VBG ignored|VBN.After assigning part of speech to each word, it is necessary to assign tag to each word so that we can identify that which is firsttag, second tag and remaining tag. Separation of tags can be useful in interjection related tweet to identify sarcasm. Bharti et al.

9proposed algorithm for assignment of tag to each word.P.O.S. TaggingData: dataset := Annotated corpusResult: WT := dictionary variable with pair for each word with its tag in the corpusTT := dictionary variable with for bigram tag pair T := dictionary variable with pair for each tag with its occurrenceswhile sentence in corpus dowhile word in sentence doif word==first word then previous tag =$ current tag = POS tag of current word TTprevious tag, current tag++ Tcurrent tag++ WTword, current tag++endelse previous tag =POS tag of previous word current tag = POS tag of current word TTprevious tag, current tag++ Tcurrent tag++ WTword, current tag++endendendD. Sentiment analysis of phraseAfter p.

o.s tagging, Sentiment analysis of phrase can be done. For that positive ratio and negative ratio have to determine.Positive ratio refers to total number of positive words in phrase from total number of words in phrase. Negative ratio refers to totalnumber of negative words present in phrase from total number of words in phrase.

Intensifier has high impact to detect sarcasm. Forexample, Fantastic weather has high impact then good weather. Apply rule based pattern to find polarity of word if any intensifieris present. Sentiment score can be calculated as:Sentiment score= Positive Ratio – Negative RatioPWP PRTWP?NWP NRTWP?PR= Positive Ratio, NR= Negative Ratio, PWP= Number of positive words per phrase, NWP=Number of negative words perphrase, TWP= Total words in phrase.

E. Feature based composite approachFeature based composite approach (FBCA) using mapreduce is our proposed algorithm that is explained in section IV. Here,two features lexical and hyperbole is composite and mapreduce is used for faster execution. Also, consider punctuation feature andnegation feature to improve precision of system.

After execution of proposed algorithm as a result tweet is sarcastic or not isknown. In this step, actual detection of sarcasm is done.F. Compare precision with individual approachWe have to find and compare precision with individual approach so that we can identify improvement in our proposedapproach. Precision refers to the fraction of retrieved sarcastic tweets that are relevant. In other words, it measures the number oftweets that have successfully been classified as sarcastic over the total number of tweets classified as sarcastic. For findingprecision, true positive tp, true negative tn, false positive fp, false negative fn parameters are considered.

True positive refers to tweetis positive and considered it positive. True negative refers to tweet is actually negative and is detected negative. False positive refersto tweet is positive and is detected negative. False negative refers to tweet is negative and is detected positive. So confusion matrixis created after execution of proposed algorithm. After performing all steps, output is shown in graph form for comparison amongindividual algorithm with proposed algorithm.

IV. PROPOSED ALGORITHMFBCA (Feature based Composite Approach)Input: Tweet Corpus, interjection corpus, P.O.

S. tag file (TF), Parse file (PF)Output: Classification of tweets as sarcastic or not sarcastic.Notation: A: adjective, V: verb, R: adverb, N: noun, UH: interjection, T: tweets, C: corpus, t: tag, TWT: tweet wise tag, FT: firsttag, INT: immediate next tag, NT: next tag, SF: sentiment file, sf: situation file, PSF: positive sentiment file, NSF: negativesentiment file, psf: positive situation file, nsf: negative situation file, SC: sentiment score, E: exclamation mark more than two,ISC: interjection sarcastic count, IF: interjection file, TWP: tweet wise phraseInitialisation: TF = {?}, SF = {?}, sf = {?}, PSF = {?},NSF = {?}, psf = {?}, nsf = {?}, count= 0, flag=0for T in C doTake FT, INT, NT from TWTif UH in Tif FT = UH && INT = (ADJ || ADV) && NT= E thenTweet is sarcastic & increment ISCStore tweet into IFelse if (FT = UH) && (NT=(ADV + ADJ) && (ADJ+ N) && (ADV + V)) thenTweet is sarcastic & increment ISC,Store tweet into IFelseTweet is not sarcastic end iffor T in IF dok = find_parse (T)PF?TF?kend forelsefor TWP in PF dok = find_subset (TWP)if k = NP || ADJP || (NP + V P) thenSF?SF?kelse if k = V P || (ADV P + V P) || (V P + ADV P) || (ADJP + V P) || (V P + NP) || (V P + ADV P +ADJP) ||(V+ADJP+NP) || (ADV+ADJP+NP) thensf?sf?kend ifend forfor P in SF doSC = sentiment_score (P)if SC >0.0 then PSF? PSF? Pelse if SC <0.0 then NSF? NSF? PelseNeutral Sentiment Phraseend ifend forfor P in sf doSC = sentiment_score (P)if SC >0.

0 then psf? psf? Pelse if SC <0.0 then nsf? nsf? PelseNeutral Situation Phraseend ifend forwhile words in tweet doif word?PSF && count==0count = 1;check nsfcontinue;endif word?nsf && (count == 1)flag = True; break;endelseif word?NSF && (count == 0)count = 1check psf;continue;endif word?psf && (count == 1)flag = True; break;endendif flag==True thenGiven tweet is sarcasticendelseGiven tweet is not sarcasticendend ifend forFBCA is used to detect sarcastic tweet using lexical and hyperbole feature. In this approach, first we check about interjectionwords related tweets. Lunando 10 statement as they said "if the text is using interjection words, the text has more tendencies tobe classified into sarcastic". So, first as an input we have to give interjection corpus 11 that is used to find different interjectionwords available in tweet, P.

O.S. tag file stores first tag, immediate next tag, next tag for particular tweet that can be done byP.O.S. tagging algorithm, parse file stores different phrases that is generated by TEXTBLOB 12 tool for specific tweet.

Theoutput of proposed algorithm is tweet is sarcastic or not.FBCA is focused on interjection words and number of exclamation marks present to detect sarcasm easily. If first tag isinterjection word and immediate next id adverb or adjective and next tag is exclamation mark then tweet classify as sarcastic.

Orfirst tag is interjection word and next tag is adverb followed by adjective or adjective followed by noun or adverb followed byverb then tweet classified as sarcastic. Also, store interjection related tweet in IF (Interjection File) to create more sentiment andsituation. As we have more sentiment and situation it became easy to detect sarcasm for further analysis.If there is no interjection words are present then apply rule based pattern to create sentiment and situation file.

Sentiment andsituation is used to detect sarcasm if contradiction between them present or not. Contradiction between positive sentiment andnegative situation, contradiction between negative sentiment and positive situation is identified by applying rule based pattern.For creating sentiment, if phrase is noun or adjective or noun followed by verb then it store in sentiment file. For situation file,phrase has verb or adverb followed by verb or verb followed by adverb or adjective followed by verb or verb followed by noun orverb followed by adverb followed by adjective or verb followed by adjective followed by noun or adverb followed by adjectivefollowed by noun then it store in situation file. Then create positive sentiment file, negative sentiment file, positive situation file,negative situation file using sentiment score of phrases.

If contradiction between sentiment and situation present then it classify assarcastic otherwise not. If some phrase don’t have sentiment then it goes in neutral situation and don’t have to process it.We are using mapreduce for reducing execution time because constructing sentiment and situation file takes time so we needto do task parallel. In map phase, we are detecting interjection related tweets and classify as sarcastic. Also, we are creatingsentiment and situation file using rule based pattern. In reduce phase, we have to create positive sentiment file, negative sentimentfile, positive situation file, negative situation file using sentiment score and check for sarcastic tweet. At the end, we have tocombine all result from map phase that is total number of sarcastic tweets detected.

V. CONCLUSIONSarcasm detection is challenging task due to no predefined structure present. Researchers are improving accuracy of sarcasmdetection by providing different algorithms. In this paper, we proposed algorithm that include lexical feature and hyperbolefeature to detect sarcasm. Also, consider three types of sarcasm (i) contradiction between positive sentiment and negative situation(ii) contradiction between negative sentiment and positive situation (iii) occurrence of interjection words. We proposed algorithmthat also consider punctuation related feature to improve precision.

In proposed algorithm, constructing sentiment and situationfile takes time so if we use hadoop framework that reduce our execution time. We are considering two features and hybrid them to improve accuracy of system. In future, we will consider emoticon to detect sarcasm. If contradiction between text and emoticonpresent then it became sarcasm.

Also, proposed algorithm for different language is still area of research in future.References1 Bharti, S. K., Babu, K. S., & Jena, S. K, “Parsing-based sarcasm sentiment recognition in twitter data,” 2015 IEEE/ACM International Conference onAdvances in Social Networks Analysis and Mining(ASONAM), Paris, 2015, pp.

1373-1380.2 A. Joshi, P.

Bhattacharyya, and M. J. Carman., (Feb.

2016). “Automatic sarcasm detection: A survey.” Online.

Available:https://arxiv.org/abs/1602.034263 M. Bouazizi and T. Otsuki Ohtsuki, “A Pattern-Based Approach for Sarcasm Detection on Twitter,” in IEEE Access, vol. 4, pp. 5477-5488,2016.4 Rajadesingan, A.

, Zafarani, R., & Liu, H. (2015). “Sarcasm detection on twitter: A behavioral modeling approach.” 2015 WSDM -Proceedings of the 8th ACM International Conference on Web Search and Data Mining, pp.

97-1065 Tungthamthiti, P., Shirai, K., & Mohd, M. (2014).

“Recognition of sarcasm in tweets based on concept level sentiment analysis andsupervised learning approaches.” 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014, pp. 404-4136 Riloff, Ellen & Qadir, A & Surve, P & De Silva, L & Gilbert, N & Huang, R.

“Sarcasm as contrast between a positive sentiment andnegative situation.” Proceedings of EMNLP 2013, pp. 704-714.7 Clews P.

& Kuzma J.(2017). “Rudimentary Lexicon Based Method for Sarcasm Detection.” International Journal of Academic Researchand Reflection, 5(4), 24-33.8 N.Vijayalaksmi, Dr. A.Senthilrajan.

“A hybrid approach for Sarcasm Detection of Social Media Data.” International Journal of Scientificand Research Publications (IJSRP), Volume 7, Issue 5, May 20179 Bharti, S. K., Vachha, B., Pradhan, R. K., Babu, K.

S., & Jena, S. K. “Sarcastic sentiment detection in tweets streamed in real time: A bigdata approach.” Digital Communications and Networks, 2(3), pp.

108-12110 Lunando, Edwin & Purwarianti, Ayu. “Indonesian Social Media Sentiment Analysis With Sarcasm Detection.” 195-198.10.1109/ICACSIS.2013.6761575.11 Enchanted Learning http://www.enchantedlearning.com/Home.html12 TextBlob: Simplified Text Processing — TextBlob 0.15.0 documentation http://textblob.readthedocs.io/en/dev/