=Paper=
{{Paper
|id=Vol-2600/short7
|storemode=property
|title=A Study in Practical Solutions to Sarcasm Detection with Machine Learning and Knowledge Engineering Techniques
|pdfUrl=https://ceur-ws.org/Vol-2600/short7.pdf
|volume=Vol-2600
|authors=Chia Zheng Lin,Michal Ptaszynski,Masui Fumito,Gniewosz Leliwa,Michal Wroczynski
|dblpUrl=https://dblp.org/rec/conf/aaaiss/LinPMLW20
}}
==A Study in Practical Solutions to Sarcasm Detection with Machine Learning and Knowledge Engineering Techniques==
A Study in Practical Solutions to Sarcasm Detection with Machine Learning and
Knowledge Engineering Techniques
Chia Zheng Lin*, Michal Ptaszynski*, Masui Fumito*, Gniewosz Leliwa**, Michal Wroczynski**
*Graduate School of Computer Science, Kitami Institute of Technology, Japan
**Samurai Labs, Poland
{chiazhenglin}@gmail.com, {ptaszynski,f-masui}@cs.kitami-it.ac.jp, {gniewosz.leliwa, michal.wroczynski}@samurailabs.ai
Abstract Researchers’ interest in analysing this profound type of
figurative and creative use of language grew along with the
In this paper we tackle the problem of sarcasm detection with dramatic increase in the everyday use of social media over
the use of machine learning and knowledge engineering tech-
niques. Sarcasm detection is considered a complex and chal-
the past decade. Especially, Twitter has become one of the
lenging task in Natural Language Processing and has been most popular venues for people to express their opinions,
studied by various researchers in the past decade. To get a share their thoughts and report real-time events, etc. More-
grasp on the present state of the art in sarcasm detection, we over, the huge amount of data has drawn interest of compa-
review the important previous research in this field, with a fo- nies for the purpose of studying the opinion of people to-
cus on text-based sarcasm detection in English texts. In the wards different products, facilities and events. It has been
proposed method, we compare various dataset preprocess- suggested that the nature of tweets makes them the most
ing techniques on the proposed Deep Convolutional Neural suitable for studying sarcasm detection approaches (Bouaz-
Network model. As a result, the most specific, or least pre- izi and Otsuki 2016).
processed dataset ranked as the one with the highest perfor- However, the lack of empirical investigations into opti-
mance. However, we observed that some level of data prepro-
cessing could become useful in the task of sarcasm detection.
mal approaches for sarcasm detection is a serious oversight
in many related studies carried out throughout the years. Im-
portantly, there have been no studies comparing the differ-
Introduction ences in the preprocessing and manipulation of the dataset
to improve the results of detection.
Sarcasm, often used together or interchangeably with irony,
is considered an important component of human communi- To contribute to dealing with the above-mentioned prob-
cation recognized as some of the most prominent and perva- lems, in this paper we investigate the variations in sar-
sive devices of figurative and creative language widely used casm detection results caused by differences in applied pre-
from dating back to ancient religious texts to modern times processing techniques typically used in NLP research but
(Ghosh and Veale 2017). not applied before in works focusing on sarcasm detection.
To do that most effectively, we firstly review previous re-
Van Hee (2017) suggested the important implications of
lated research on text-based sarcasm detection from En-
irony and sarcasm for Natural Language Processing (NLP)
glish tweets, describe the implemented dataset preprocess-
tasks, which aim to explain construct of human language,
ing techniques, and discuss the results of an experiment per-
and the large potential in the domain of text mining. In the
formed to compare preprocessing techniques implemented
recent years, there has been an increasing interest in, espe-
on the dataset. As a result, we managed to observe the im-
cially, automatic sarcasm detection and classification, which
pact contributed by hashtags and labels related to sarcasm.
have been widely studied as a type of sentiment analysis task
Finally, Ptaszynski et al. (2010) in their research on devel-
(detecting whether a sentence conveys a positive or negative
oping an expert system for Internet Patrol pointed out that,
connotation, or in this case: sarcastic or non-sarcastic). Es-
especially with regard to the increased popularity of SNS,
pecially, Kumar et al. (2017) surveyed some representative
sarcasm has been often used in personal attacks, such as cy-
work in the related area and categorized most of the popular
berbullying and concluded that sarcasm detection is one of
approaches into three types, namely, rule-based, statistical,
the important problems in cyberbullying detection. There-
and deep learning-based approaches. We analyse some of
fore, as one of the practical applications, in this research we
that research in the next section.
will verify how effective is sarcasm detection in the detec-
Copyright c 2020 held by the author(s). In A. Martin, K. Hinkel- tion of cyberbullying.
mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen
(Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com- Research Background
bining Machine Learning and Knowledge Engineering in Practice
(AAAI-MAKE 2020). Stanford University, Palo Alto, California, The word sarcasm originates from an Ancient Greek word
USA, March 23-25, 2020. Use permitted under Creative Commons sarkasmós and means ”to tear flesh, bite the lip in rage,
License Attribution 4.0 International (CC BY 4.0). sneer.” According to Oxford dictionary (2019), sarcasm is a
way of using words that are the opposite of what one means Deep Learning approaches have been successfully
in order to be unpleasant to somebody or to make fun of brought into the scene of sarcasm detection when Amir
them. They also described irony to be the use of words that (2016) used a standard binary classification with Convolu-
say the opposite of what you really mean, often as a joke. tional Neural Network (CNN) while Poria (2016) imple-
The relationship between irony and sarcasm has been con- mented a combination of CNNs trained on different tasks.
fused in many studies. In the literature, two types of irony are Popular Deep Learning algorithms include CNN (LeCun et
widely considered: verbal irony and situational irony. While al. 1998) and Long Short Term Memory (LSTM) (Hochre-
situational irony involves an incongruence between two sit- iter and Schmidhuber 1997). Ghosh and Veale (2017) pro-
uations, verbal irony, although applying verbal, or semantic posed a network model composed of CNN followed by an
incongruence, is a statement in which the meaning that a LSTM network which outperformed many other models at
speaker employs is sharply different from the meaning that that time. They utilized CNN to reduce frequency varia-
is ostensibly expressed. Hence, verbal irony is considered tion through convolutional filters and extract discriminating
different from situational irony in that it is produced inten- word sequences as a composite feature map for the LSTM
tionally by the speakers. layer. Then the output of the LSTM layer was passed to a
When it comes to sarcasm, Van Hee (2017) defines it to fully connected Deep Neural Network (DNN) layer, produc-
be a form of verbal irony with an aggressive tone, is directed ing a higher order feature set based on the LSTM output.
at someone or something, and is used intentionally. Hence Following the Semantic Evaluation 2018 international
the term “irony” and “sarcasm” are used interchangeably in workshop Task 3: Irony Detection in English Tweets (2018)
many related studies. In this study, we decided to not focus which received submissions from 43 teams worldwide for
on distinguishing between sarcasm and irony, and instead the binary classification task A, deep learning algorithms
implement the general term “sarcasm” throughout the paper. were further explored and optimized for irony detection
tasks. The best ranked system submitted by team THU NGN
Previous Research (2018) consisted of densely connected LSTM network with
Tepperman (2006)’s spoken dialogue system used feature multi-task learning strategy. Another system from one of the
extraction approach for sarcasm detection as a subtask in top teams, NTUA-SLP (2018), which used an ensemble of
their system, by which they introduced sarcasm detection two bi-directional LSTM network-based models, achieved
into the scene of Nature Language Processing. One study comparable results. The submissions represented a variety
by Davidov (2010) utilized tweets and Amazon reviews for of neural network-based approaches and other popular clas-
text-based sarcasm detection, and Tsur (2010) proposed one sification algorithms including SVM, Random Forest, and
of the first attempts to use feature engineering and statistical Naı̈ve Bayes (Van Hee, Lefever, and Hoste 2018). Over-
classifiers to detect sarcasm. all, the approaches with ensemble learners were the current
A number of studies have sought to detail the recent trend to tackle the challenges in sarcasm detection.
trend in sarcasm detection approaches, which can roughly be
classified into three parts: rule-based, statistical, and deep- Proposed Method
learning approaches (Kumar, Somani, and Bhattacharyya
2017; Barbieri 2017). Rule-based approaches attempt to
Dataset Preprocessing
identify irony through specific evidence which could be cap- In the majority of recent studies applying machine learn-
tured in terms of rules that rely on indicators of sarcasm. ing methods to text classification, the datasets are usu-
Barberi (2017) argued that rule-based approaches which re- ally used in their most basic form, namely, represented
quire no training mostly rely on lexical information and do as tokens (words, punctuation, etc.), despite a wide vari-
not perform as well as statistical approaches. Riloff (2013) ety of knowledge-based NLP systems (e.g., stemmers, part-
aimed to recognize positive words in negative sentences of-speech taggers, etc.) capable of initial preprocessing of
while presenting a bootstrapping algorithm that automati- datasets, thus providing more informative features to ML
cally learns the rules from certain situations. algorithms. Therefore in this research we performed addi-
Most of the early works on sarcasm detection applied tional preprocessing to the dataset to verify usefulness of
statistical approaches which varied in terms of features such knowledge-base systems in ML.
and learning algorithms, basically composed of two phases For the implemented dataset, each tweet was first trans-
where data were converted into feature vectors before be- formed into lowercase and emojis were represented with
ing classified using machine learning algorithm. Some of their corresponding labels (e.g. :smileyface:) using Emoji
the most often used algorithms include Support Vector Ma- for Python (2019). All tagged users (e.g. @user123) and
chines (SVM), and Naı̈ve Bayes. One of the first attempts URLs (e.g. http://google.com/) appearing in the text were
in this approach by Tsur (2010) compiled a set of sarcas- replaced with specific neutral labels, such as ” tagged ” and
tic patterns composed of common combinations of words ” url .” The first dataset preprocessing technique to be used
extracted from sarcastic examples. Gonzalez-Ibanez (2011) in this study is shown below.
composed a model with three pragmatic features which were 1. Only basic preprocessing.
positive emoticons, negative emoticons, and users’ tagging. To verify the depth of dependence of sarcasm detection on
Reyes (2013) proposed another model based on four fea- hashtags, all of the hashtags (e.g. #sarcasm) in the next 5
tures, signatures, unexpectedness, style and polarity, and versions of the dataset shown below were replaced with a
emotional scenarios. general label, e.g., “ hashtag .”
2. URLs, tagged users and hashtags replaced with labels. Veale (2017) and consists of 51,189 tweets (24,453 sarcas-
Furthermore, we applied the knowledge-based tools for lan- tic tweets and 26,736 non-sarcastic tweets) in which sarcas-
guage processing provided by NLTK (2019). tic tweets were automatically collected from Twitter using
3. Stemming of all words using Porter Stemmer (2019) user’s self-declaration of sarcasm/irony with sarcastic and
4. Stopwords removal with NLTK built-in Stopwords Filter- ironic hashtags (e.g. #irony, #sarcasm) and annotated for
ing Tool confirmation. All seven dataset versions were implemented
5. Stemming of all words after stopwords removal with different data preprocessing methods.
6. PoS tagging using NLTK Universal Part-of-Speech Tagset
Finally we have our last dataset 7 to have its social media Experiment Setup
markers such as hashtags, URLs, and tagged users removed All seven separate versions of the dataset (represented with
instead of being replaced with labels. various preprocessing techniques) were analysed in the ex-
7. Tagged users, URLs, and hashtags removed periment using the proposed CNN method in the setting of
Below are three examples of a tweet, with hashtags (dataset a 10-fold cross validation procedure. The results were cal-
1), with hashtags replaced with labels (dataset2), and with culated using standard balanced F-score (F1) which is the
hashtags removed (dataset7). harmonic mean of Precision and Recall.
monday morning is my favorite! #sarcasm
monday morning is my favorite! _hashtag_ Results and Discussion
monday morning is my favorite! Table 1 shows the summary of all results from the 7 datasets
with different preprocessing techniques applied. Dataset 1
Feature Weighting which is the dataset with all the hashtags included yielded
an F1 score of 0.997. Compared to our previous work (Chia,
Traditional weight calculation scheme was applied to all ver- Ptaszynski, and Masui 2019) which tested on a smaller data
sions of the dataset. In particular, we used term frequency set with only 4,618 tweets and attained an F1 score of 0.844
with inverse document frequency (tf*idf). Term frequency with similar settings (hashtags included), this shows the sig-
tf (t,d) refers here to the traditional raw frequency, which is nificant increase in the performance of the CNN model with
the number of times a term t (word, token) occurs in a doc- the increase of the size of the dataset. This suggests that the
ument d. Inverse document frequency idf (t,D) is the log- model is tied to the size of the implemented dataset and the
arithm of the total number of documents D containing the number of extracted features.
term t. Finally tf*idf refers to the term frequency multiplied The results of dataset 1 (hashtags included) also enhance
by inverse document frequency as in equation 1. our understanding of the impact of hashtags, which make
|D| a great difference in sarcasm and irony detection, especially
idf (t, D) = log (1) in Twitter messages. However, due to the natural characteris-
nt
tics of deliberate sarcastic hashtags in Twitter, classification
Applied Classifier of tweets with hashtags included does not contribute much
to the study of sarcasm detection from linguistic point of
Based on our previous work (Chia, Ptaszynski, and Masui view. However, as the results show, hashtags can be a very
2019; Ptaszynski, Eronen, and Masui 2017), in this study useful practical mean to handle sarcasm detection with high
we propose to use Convolutional Neural Networks (CNN) performance.
due to it having the best result for classifying tweets without While the remaining datasets were stripped of their hash-
ironic hashtags when compared to other classifiers. tags (replaced with labels), data set 2 has no further prepro-
CNN are a type of feed-forward artificial neural network cessing while data set 3 to 6 were further processed with
which is an improved neural network model originally de- different methods. Interestingly, data set 2 still attained the
signed for image recognition. CNN performance has been highest F1 score among all the data sets without hashtags
proved useful in various classification tasks including sen- included. This discovery highlights the importance of lin-
tence classification and NLP (Kim 2014; Ptaszynski, Ero- guistic features in irony detection and shows that increment
nen, and Masui 2017). in data preprocessing does not always provide better results.
In the proposed CNN we applied Rectified Linear Units This is due to the oversimplification of data with many vital
(ReLU) as neuron activation function which is a piece-wise and important features manipulated or removed while clas-
linear function that will output the input directly if positive, sification tasks such as irony detection heavily depended on
zero if negative. We also applied dropout regularization. The them.
CNN consisted of two hidden convolutional layers, contain- However, further preprocessed data sets have their own
ing 20 and 100 feature maps, respectively, with both layers value despite attaining lower F1 score. From our observation
having 5x5 patch size and 2x2 max-pooling. on the attributes extracted from their confusion matrices in
Table 1, their true positive rate is higher than the data set 2
Evaluation Experiment which scored the highest F1 score among the datasets. Data
set 5 which implemented both stemming and stop-word re-
Dataset Description moval has obtained the highest true positive rate with only
The dataset used in this research was the publicly avail- 290 false positive. This shows the implementation of further
able sarcasm detection dataset collected by Ghosh and data preprocessing is crucial to the sensitivity of the data.
Data set True Positive False Positive False Negative True Negative F-score
1 With hashtags 24355 98 72 26664 0.997
2 Without hashtags 24055 398 5068 21668 0.898
3 Stemming applied 24013 440 5172 21564 0.895
4 Stopwords removed 24009 444 5183 21553 0.895
5 Stemming and Stopwords removed 24163 290 5590 21146 0.892
6 PoS Tagging applied 23904 549 5171 21565 0.893
7 Hashtags, URL, tagged users removed 16509 7944 8677 18059 0.665
Table 1: Results from seven datasets with different preprocessing.
Dataset 1 occ Dataset 2 occ Dataset 7 occ or irony in textual communication especially on social net-
#sarcasm 71 hashtag 5445 love 1656 work service.
sarcasm 60 tagged 1639 like 1216
tagged 51 love 413 not 1211 Application in Automatic Cyberbullying Detection
love 22 great 275 good 752
great 8 not 245 great 709 Although the number of research on sarcasm and irony de-
not 8 best 133 hate 488 tection grows each year, practical implementation of such
models have not been widely discussed. Ptaszynski et al.
Table 2: Top 6 error feature occurrences for dataset 1, 2 and
(Ptaszynski et al. 2010) mentions, that sarcasm poses a prob-
7 (occ = occurrence)
lem in cyberbullying (CB) detection. Therefore, aiming to
Finally for the last dataset 7 which had all of its social improve their expert system for automated Internet Patrol,
media markers, such as tagged users (e.g. @user123), URLs, we propose a practical implementation of sarcasm detection
and hashtags completely removed, Table 1 shows that the re- in cyberbullying detection.
sult dropped significantly to an F1 score of 0.665 comparing To quantify the extent to how such model would be use-
to other datasets. This case has shown the impact of the la- ful, we applied the model trained on sarcastic dataset 2 and
bels which were supposed to be neutral to the classification. tested on the cyberbullying detection dataset provided by
Comparing to dataset 2 which had the social media markers Ptaszynski et al. (2018) which consists of 12,728 data sam-
replaced with labels, the significant increase in false nega- ples. The result attained an F-score of 0.889 which is com-
tives shows that the presence of the labels provides heavy parable to the result of dataset 2 with an F-score of 0.898
contribution to the precision of the classification. above. Interestingly, it was also much higher than models
trained on purely cyberbullying-related data (Ptaszynski et
Error Analysis al. 2018). This observation shows the prevalence of sarcasm
in cyberbullying, and proves the practical applicability of
Table 2 shows the occurrences of top 6 error features ex-
sarcasm detection in other tasks.
tracted from dataset 1 (with hashtags), dataset 2 (hashtags
replaced with labels) and dataset 7 (hashtags, URLs, and
tagged users removed) after removing prepositions, con- Conclusion
junctions, and pronouns which do not contribute much to
the classifications. For dataset 1, the error feature which oc- In this paper, to find practical solutions for sarcasm detec-
curred the most is the #sarcasm following the word sarcasm. tion on Twitter, we compared various dataset preprocessing
This shows that even the sarcastic hashtags cannot assist the methods and observed the impact of the preprocessed labels.
model to achieve 100% sensitivity. We firstly reviewed previous related works on text-based
For the dataset 2 results in the second column, the label sarcasm detection, where we covered various types of sys-
hashtag appeared 5445 times out of the 5466 misclassified tems, such as rule-based, statistical, or deep learning-based.
instances (99.62%). Coming up next is the label tagged Next, we compared datasets with various preprocessing on
which appeared 1639 times while the remaining words such the proposed CNN model.
as “love”, “great”, “not”, and “best”, which are popular er- The first dataset with hashtags included scored an F1 of
rors in all the 3 implemented datasets. As previously noticed, 0.9965, thus proving the dependence on hashtags in sar-
the supposedly neutral labels, in fact contribute heavily to casm detection. Next, the dataset with the least preprocess-
the precision of the classification. Therefore, removing them ing ranked the highest among all datasets without hashtags
does not provide improvement to the results. included. However, we observed that data preprocessing is
The evidences so far provide further support for the hy- still crucial to the sensitivity of data Lastly, this research
pothesis that deliberate sarcastic hashtags play a significant serves as a base for future studies on application of sarcasm
role in sarcasm detection in tweets. Taken together, these re- detection in other tasks, such as cyberbullying detection.
sults also suggest that hashtag is the product of authors who In the future, we also plan to further improve the proposed
understand that their sarcastic phrases alone may not be suf- method with more and diverse features and test it on larger
ficient for the audience to figure out the intended irony or datasets, also with other preprocessing techniques. We will
sarcasm. However, these findings do not completely solve also focus on optimizing the feature extraction and the clas-
the general sarcasm detection nor do they redefine sarcasm sifier model.
References Oxford. 2019. Oxford learner’s dictionary.
https://www.oxfordlearnersdictionaries.com/.
Amir, I.; Wallace, B. C.; Lyu, H.; Carvalho, P.; and Silva,
M. J. 2016. Modelling context with user embeddings for Poria, S.; Cambria, E.; Hazarika, D.; and Vij, P. 2016. A
sarcasm detection in social media. In Proceedings of the 20th deeper look into sarcastic tweets using deep convolutional
SIGNLL Conference on Computational Natural Language Learn- neural networks. COLING 2016.
ing (CoNLL). Association for Computational Linguistics. Porter, M. 2019. The porter stemming algorithm.
Barbieri, F. 2017. Machine learning methods for under- https://tartarus.org/martin/PorterStemmer/.
standing social media communication: Modeling irony and Ptaszynski, M.; Dybala, P.; Matsuba, T.; Masui, F.; Rzepka,
emojis. Department DTIC. R.; Araki, K.; and Momouchi, Y. 2010. In the service of
Baziotis, C.; Athanasiou, N.; Papalampidi, P.; Kolovou, A.; online order: Tackling cyber-bullying with machine learning
Paraskevopoulos, G.; Ellinas, M.; and Potamianos, A. 2018. and affect analysis. In International Journal of Computational
Linguistics Research. Hokkaido University.
Ntua-slp at semeval-2018 task 3: Tracking ironic tweets us-
ing ensembles of word and character level attentive rnns. Ptaszynski, M.; Leliwa, G.; Piech, M.; and Smywinski-Pohl,
In Proceedings of the 12th International Workshop on Seman- A. 2018. Cyberbullying detection - technical report 2/2018,
tic Evaluation(SemEval-2018). Association for Computational department of computer science agh, university of science
Linguistics. and technology.
Bird, S.; Loper, E.; and Klein, E. 2019. Natural language Ptaszynski, M.; Eronen, J. K. K.; and Masui, F. 2017. Learn-
toolkit. https://www.nltk.org/. ing deep on cyberbullying is always better than brute force.
In IJCAI 2017 3rd Workshop on Linguistic and Cognitive Ap-
Bouazizi, M., and Otsuki, T. 2016. A pattern-based ap- proaches to Dialogue Agents (LaCATODA 2017).
proach for sarcasm detection on twitter. In Digital Object.
IEEE Access. Reyes, A.; Rosso, P.; and Veale, T. 2013. A multidimen-
sional approach for detecting irony in twitter. Lang Re-
Chia, Z. L.; Ptaszynski, M.; and Masui, F. 2019. Exploring sources and Evaluation.
machine learning techniques for irony detection. In Proceed-
ings of The 33rd Annual Conference of the Japanese Society for
Riloff, E.; Qadir, A.; Surve, P.; Silva, L. D.; Gilber, N.; and
Artificial Intelligence (JSAI 2019). Japanese Society of Artifi-
Huang, R. 2013. Sarcasm as contrast between a positive
cial Intelligence. sentiment and negative situation. In Proceddings of the 2013
Conference on Empirical Methods in Natural Language Process-
Davidov, D.; Tsur, O.; and Rappoport. 2010. Semi- ing(EMNLP 2013). EMNLP.
supervised recognition of sarcastic sentences in twitter and
Tepperman, J. 2006. Yeah right: Sarcasm recognition for
amazon. In Proceedings of the 14th Conference on Computa-
spoken dialogue system. In Interspeech 2006. ICSLP.
tional Natural Language Learning. Association of Computa-
tional Linguistics. Tsur, O.; Davidov, D.; and Rappoport, A. 2010. Icwsm – a
great catchy name: Semi-supervised recognition of sarcastic
Ghosh, A., and Veale, T. 2017. Fracking sarcasm using neu- sentences in online product reviews. In Proceedings of the
ral network. In Proceedings of NAACL-HLT 2016. Association 4th International AAAI Conference on Weblogs and Social Media.
for Computational Linguistics. Association for the Advancement of Artificial Intelligence.
Gonzalez-Ibanez, R.; Muresan, S.; and Wacholder, N. 2011. Van Hee, C.; Lefever, E.; and Hoste, V. 2018. Semeval-2018
Identifyinng sarcasm in twitterl: A closer look. In Proceed- task 3: Irony detection in english tweets. In Proceedings of the
ings of the 49th Annual Meeting of the Association For Computa- 12th International Workshop on Semantic Evaluation (SemEval-
tional Linguistics. Assosiation for Computational Linguistics. 2018). Association for Computational Linguistic.
Hee, C. V. 2017. Can machine sense irony? exploring auto- Wu, C.; Wu, F.; Wu, S.; Liu, J.; Yuan, Z.; and Huang, Y.
matic irony detection on social media. University Gent. 2018. Thu ngn at semeval-2018 task 3: Tweet irony de-
Hochreiter, S., and Schmidhuber, J. 1997. Long short-term tection with densely connected lstm and multi-task learn-
memory. ing. In Proceedings of the 12th International Workshop on Seman-
tic Evaluation(SemEval-2018). Association for Computational
Kim, T., and Wurster, K. 2019. Emoji for python.
https://pypi.org/project/emoji/. Linguistics.
Kim, Y. 2014. Convolutional neural networks for sentence
classification. In Proceedings of the 2014 Conference on Empir-
ical Methods in Natural Language Processing (EMNLP). Associ-
ation for Computational Linguistics.
Kumar, L.; Somani, A.; and Bhattacharyya, P. 2017. Ap-
proaches for computational sarcasm detection: A survey.
ACM CSUR.
LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998.
Gradient-based learning applied to document recognition. In
Proc of the IEEE.