<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bi-ISCA: Bidirectional Inter-Sentence Contextual Attention Mechanism for Detecting Sarcasm in User Generated Noisy Short Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Prakamya Mishra</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saroj Kaushik</string-name>
          <email>saroj.kaushik@snu.edu.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kuntal Dey</string-name>
          <email>kuntal.dey@accenture.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Accenture Technology Labs</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Shiv Nadar University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>Many online comments on social media platforms are hateful, humorous, or sarcastic. The sarcastic nature of these comments (especially the short ones) alters their actual implied sentiments, which leads to misinterpretations by the existing sentiment analysis models. A lot of research has already been done to detect sarcasm in the text using user-based, topical, and conversational information but not much work has been done to use inter-sentence contextual information for detecting the same. This paper proposes a new deep learning architecture that uses a novel Bidirectional Inter-Sentence Contextual Attention mechanism (Bi-ISCA) to capture intersentence dependencies for detecting sarcasm in the user-generated short text using only the conversational context. The proposed deep learning model demonstrates the capability to capture explicit, implicit, and contextual incongruous words &amp; phrases responsible for invoking sarcasm. Bi-ISCA generates results comparable to the state-of-the-art on two widely used benchmark datasets for the sarcasm detection task (Reddit and Twitter). To the best of our knowledge, none of the existing models use an intersentence contextual attention mechanism to detect sarcasm in the user-generated short text using only conversational context.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Sentiment analysis is one of the most important natural
language processing (NLP) applications. Its goal is to identify,
extract, quantify, and study subjective information. The
sudden rise in the usage of social media platforms as a means of
communication has led to a vast amount of data being shared
between its users on a wide range of topics. This type of data
is very helpful to several organizations for analyzing the
sentiments of people towards products, movies, political events, etc.
Understanding the unique intricacies of the human language
remains one of the most important pending NLP problems
of this time. Humans regularly use sarcasm as a crucial part
of the day-to-day conversations when venting, arguing, or
∗Contact Author
maybe engaging on social media platforms. Sarcastic remarks
on these platforms inflict problems on the existing sentiment
analysis systems in identifying the true intentions of the users.</p>
      <p>The Cambridge Dictionary1 describes sarcasm as an irony
conveyed hilariously or amusingly to criticize something.
Sarcasm may not show criticism on the surface but instead might
have a criticizing implied meaning. Such a figurative aspect of
sarcasm makes it difficult to be detected in the modern micro
texts [Ghosh and Veale, 2016]. Several linguistic research has
been done to analyze different aspects of sarcasm. Kind of
responses evoked because of comments has been considered a
major indicator of sarcasm [Eisterhold et al., 2006]. [Wilson,
2006] states that circumstantial incongruity between a
comment and its corresponding contextual information plays an
important role in implying sarcasm.</p>
      <p>Previous research works have used policy-based,
statistical, and deep-learning-based methods for detecting sarcasm.
The use of contextual information like conversational
context, author personality features, or prior knowledge of the
topic, have proved to be very useful. [Khattri et al., 2015]
used sentiments of the author’s historical tweets as context.
[Rajadesingan et al., 2015] used personality features like the
author’s familiarity with twitter, language (structure and word
usage), and the author’s familiarity with sarcasm (history of
previous sarcastic tweets) for consolidating context. [Bamman
and Smith, 2015] explored the use of historical terms, topics,
and sentiments along with profile information as the author’s
context. They also exploited the use of conversational context
like the immediate previous tweets in the thread. [Joshi et al.,
2015] demonstrated that concatenation of preceding comment
with the objective comment in a discussion forum led to an
increase in the precision score.</p>
      <p>Overall in recent years a lot of work has been done to use
different types of contextual information for sarcasm detection
but none of them have used inter-sentence dependencies. In
this paper, we propose a novel Bidirectional Inter-Sentence
Contextual Attention mechanism (Bi-ISCA) based deep
learning neural network for sarcasm detection. The main
contribution of this paper can be summarised as follows:
• We propose a new deep learning architecture that uses a
novel Bidirectional Inter-Sentence Contextual attention
mechanism (Bi-ISCA) for detecting sarcasm in short texts
(short texts are more difficult to analyze due to shortage
of contextual information).
• Bi-ISCA focuses on only using the conversational
contextual comment/tweet for detecting sarcasm rather than
using any other topical/personality-based features, as
using only the contextual information enriches the model’s
ability to capture syntactical and semantical textual
properties responsible for invoking sarcasm.
• We also explain model behavior and predictions by
visualizing attention maps generated by Bi-ISCA, which
helps in identifying significant parts of the sentences
responsible for invoking sarcasm.</p>
      <p>The rest of the paper is organized as follows. Section 2
describes the related work. Then section 3, explains the
proposed model architecture for detecting sarcasm. Section 4
will describe the datasets used, pre-processing pipeline, and
training details for reproducibility. Then experimental results
are explained in section 5 and section 6 illustrates model
behavior and predictions by visualizing attention maps. Finally
we conclude in section 7.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>A diverse spectrum of approaches has been used to detect
sarcasm. Recent sarcasm detection approaches have either
mainly focused on using machine learning based approaches
that leverage the use of explicitly declared relevant features
or they focus on using neural network based deep learning
approaches that do not require handcrafted features. Also, the
recent advances in using deep learning for preforming natural
language processing tasks have led to a promising increase in
the performance of these sarcasm detection systems.</p>
      <p>A lot of research has been done using bag of words as
features. However, to improve performance, scholars started
to explore the use of several other semantic and syntactical
features like punctuations [Tsur et al., 2010]; emotion marks
and intensifiers [Liebrecht et al., 2013]; positive verbs and
negative phrases [Riloff et al., 2013]; polarity skip grams
[Reyes et al., 2013]; synonyms &amp; ambiguity[Barbieri et al.,
2014]; implicit and explicit incongruity-based [Joshi et al.,
2015]; sentiment flips [Rajadesingan et al., 2015]; affect-based
features derived from multiple emotion lexicons [Farías et al.,
2016].</p>
      <p>Every day an enormous amount of short text data is
generated by users on popular social media platforms like Twitter2
and Reddit3. Easy accessibility of such data sources has
enticed researchers to use them for extracting user-based and
discourse-based features. [Hazarika et al., 2018] utilized
contextual information by making user-embeddings for capturing
indicative behavioral traits. These user-embeddings
incorporated personality features along with the author’s writing style
(using historical posts). They also used discourse comments
along with background cues and topical information for
detecting sarcasm. They performed their experiments on the largest
Reddit dataset SARC [Khodak et al., 2018]. Many have only
used the target text for classification purposes, where a target</p>
      <sec id="sec-2-1">
        <title>2www.twitter.com/</title>
        <p>3www.reddit.com/
text is a textual unit that has to be classified as sarcastic or
not. Simply using gated recurrent units (GRU) [Cho et al.,
2014] or long short term memory (LSTM) [Hochreiter and
Schmidhuber, 1997] do not capture in between interactions
of word pairs which makes it difficult to model contrast and
incongruity. [Tay et al., 2018] were able to solve this problem
by looking in-between word pairs using a multi-dimensional
intra-attention recurrent network. They focused on modeling
the intra-sentence relationships among the words. [Kumar et
al., 2020] exploited the use of a multi-head attention
mechanism [Vaswani et al., 2017] which could capture dependencies
between different representations subspaces in different
positions. Their model consisted of a word encoder for generating
new word representations by summarizing comment
contextual information in a bidirectional manner. On top of that, they
used multi-head attention for focusing on different contexts
of a sentence, and in the end, a simple multi-layer perceptron
was used for classification.</p>
        <p>There has not been much work done in conversation
dependent (comment and reply) approaches for sarcasm detection.
[Ghaeini et al., 2018] proposed a model that not only used
information from the target utterance but also used its
conversational context to perceive sarcasm. They aimed to detect
sarcasm by just using the sequences of sentences, without any
extra knowledge about the user and topic. They combined the
predictions from utterance-only and conversation-dependent
parts for generating its final prediction which was able to
capture the words responsible for delivering sarcasm. [Ghosh and
Veale, 2017] also modeled conversational context for sarcasm
detection. They also attempted to derive what parts of the
conversational context triggered a sarcastic reply. Their proposed
model used sentence embeddings created by taking an average
of word embeddings and a sentence-level attention mechanism
was used to generate attention induced representations of both
the context and the response which was later concatenated and
used for classification.</p>
        <p>Among all the previous works, [Ghaeini et al., 2018] and
[Ghosh and Veale, 2017] share similar motives of detecting
sarcasm using only the conversational context. However, we
introduce a novel Bidirectional Inter-Sentence Contextual
Attention mechanism (Bi-ISCA) for detecting sarcasm. Unlike
previous works, our work considers short texts for detecting
sarcasm, which is far more challenging to detect when
compared to long texts as long texts provide much more contextual
information.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Model</title>
      <p>This section will introduce the proposed Bi-ISCA:
Bidirectional Inter Sentence Contextual Attention based neural
network for sarcasm detection (as shown in Figure 1). Sarcasm
detection is a binary classification task that tries to predict
whether a given comment is sarcastic or not. The proposed
model uses comment-reply pairs for detecting sarcasm. The
input to the model is represented by U = [W1u, W2u, ...., Wnu]
and V = [W1v, W2v, ...., Wnv], where U represents the
comment sentence and V represents the reply sentence (both
sentences padded to a length of n). Here, Wiu, Wjv ∈ Rd are
d−dimensional word embedding vectors. The objective is
to predict label y which indicates whether the reply to the
corresponding comment was sarcastic or not.
3.1</p>
      <sec id="sec-3-1">
        <title>Intra-Sentence Word Encoder Layer</title>
        <p>The primary purpose of this layer is to summarize
intrasentence contextual information from both directions in both
the sentences (comment &amp; reply) using Bidirectional Long
Short Term Memory Networks (Bi-LSTM). A Bi-LSTM
[Schuster and Paliwal, 1997] processes information in both
the directions using a forward LSTM [Hochreiter and
Schmidhuber, 1997] →−h , that reads the sentence S = [w1, w2, ...., wn]
from w1 to wn and a backward LSTM ←h− that reads the
sentence from wn to w1. Hidden states from both the LSTMs are
added to get the final hidden state representations of each word.
So the hidden state representation of the tth word (ht) can be
represented by the sum of tth hidden representations of the
forward and backward LSTMs (→−ht ,←h−t ) as show in equations
below.</p>
        <p>→−ht = L−−S−T−M→(wt, h−−t−→1); ←h−t = L←S−−T−M−(wt, h←t−−−1)
ht = ←h−t + →−ht</p>
        <p>This Intra-Sentence Word Encoder Layer consists of
two independent Bidirectional LSTMs for both comment
(BiLST Mc) and reply (BiLST Mr). Apart from the hidden
states, both these Bi-LSTMs also generate separate (forward
&amp; backward) final cell states represented by ←C− &amp; →−C. The
comment sentence U is given as an input to BiLST Mc and
the reply sentence V is given as an input to BiLST Mr. The
outputs of both the Bi-LSTMs are represented by the equations
3 and 4.</p>
        <p>C−→u, hu, C←−u = BiLST Mc(U )
(1)
(2)
(3)</p>
        <p>C−→v, hv, C←−v = BiLST Mr(V )
(4)</p>
        <p>Here, C−→u, C−→v ∈ Rd are the final cell states of the
forward LSTMs corresponding to BiLST Mc &amp; BiLST Mr;
C←−u, C←−v ∈ Rd are the final cell states of the backward
LSTMs corresponding to BiLST Mc &amp; BiLST Mr; hu =
[h1u, h2u, ...., hun] and hv = [h1v, h2v, ...., hvn] are the hidden
state representations of BiLST Mc &amp; BiLST Mr
respectively, where hiu, hjv ∈ Rd and hu, hv ∈ Rn×d.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Bi-ISCA: Bidirectional Inter-Sentence</title>
      </sec>
      <sec id="sec-3-3">
        <title>Contextual Attention Mechanism</title>
        <p>Sarcasm is context-dependent in nature. Even humans
sometimes have a hard time understanding sarcasm without
having any contextual information. The hidden states
generated by both the Bi-LSTMs (BiLST Mc &amp; BiLST Mr)
captures the intra-sentence bidirectional contextual information
in comment &amp; reply respectively, but fails to capture the
intersentence contextual information between them. This paper
introduces a novel Bidirectional Inter-Sentence Contextual
Attention mechanism (Bi-ISCA) for capturing the inter-sentence
contextual information between both the sentences.</p>
        <p>Bi-ISCA uses hidden state representations of U &amp; V along
with the auxiliary sentence’s cell state representations (→−C&amp; ←C−)
to capture the inter-sentence contextual information. At
first, the attention mechanism captures four sets of
atten−→ ←− −→ ←−
tions scores namely, (αCu , αCu , αCv , αCv ∈ Rn). These sets
of inter-sentence attention scores are used to generate new
inter-sentence contextualized hidden representations. Then
−→ ←−
(αCu , αCu ) are calculated using the hidden state
representations of BiLST Mr along with the forward and backward
final states (C−→u, C←−u) of BiLST Mc (as shown in equations</p>
        <p>In the above equation, bli is a bias matrix and Kil,j is a filter
connecting jth feature map of layer (l − 1) to the ith feature
map of layer (l). The output of each convolution layer is
passed through a activation function f . The proposed model
uses LeakyReLu as its activation function.</p>
        <p>f =
a ∗ x, for x ≥ 0; a ∈ R
x, for x &lt; 0</p>
        <p>For each of the CNN blocks, the corresponding
contextualized hidden representations are first concatenated (⊕) and
then given as input. The outputs of all the CNN blocks are
flattened ( F1, F2, F3, F4 ∈ Rdk) and concatenated to generate
a new vector (p ∈ R4dk), where d represents the dimension of
the hidden representation and k represents number of
convolutional filters used. This concatenated (p) vector is then given
as input to a dense layer having 4dk neurons and is followed
by the final sigmoid prediction layer.</p>
        <p>−→ −→ −→
F1 = CN N1([huC,v1 ⊕ huC,v2 ⊕ .... ⊕ huC,vn])</p>
        <p>←− ←− ←−
F2 = CN N2([huC,v1 ⊕ huC,v2 ⊕ .... ⊕ huC,vn])</p>
        <p>−→ −→ −→
F3 = CN N3([hvC,u1 ⊕ hvC,u2 ⊕ .... ⊕ hvC,un])</p>
        <p>←− ←− ←−
F4 = CN N4([hvC,u1 ⊕ hvC,u2 ⊕ .... ⊕ hvC,un])
p = [F1 ⊕ F2 ⊕ F3 ⊕ F4]
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
−→ ←−
5 &amp; 6), similarly (αCv , αCv ) are calculated using the hidden
state representations of BiLST Mc along with the forward
and backward final states (C−→v, C←−v) of BiLST Mr (as shown
in equations 7 &amp; 8). In the equations below (•) represents a
dot product between two vectors.</p>
        <p>αC−→u = [α1C−→u , α2C−→u , ...., αnC−→u ]; αiC−→u = C−→u • hiv
αC←−u = [α1C←−u , α2C←−u , ...., αnC←−u ]; αiC←−u = C←−u • hiv
αC−→v = [α1C−→v , α2C−→v , ...., αnC←−v ]; αiC−→v = C−→v • hiu
αC←−v = [α1C←−v , α2C←−v , ...., αnC←−v ]; αiC←−v = C←−v • hiu (8)
In the next step, the above calculated sets of inter-sentence
−→ ←−
attention scores αCu , αCu ) are multiplied back with the
hidden state representations of BiLST Mr to generate two new
−→ ←−
set of hidden representations hCu , hCu ∈ Rn×d of the
rev v
ply sentence namely, reply contextualized on comment
(forward) &amp; reply contextualized on comment (backward)
respec−→ ←−
tively (as shown in equations 9 &amp; 10). Similarly αCv , αCv
are multiplied back with the hidden state representations of
BiLST Mc to generate two new set of hidden representations
−→ ←−
huCv , hCv ∈ Rn×d of the comment sentence namely, comment
u
contextualized on reply (forward) &amp; comment contextualized
on reply (backward) respectively (as shown in equations 11
&amp; 12). In the equations below (×) represents multiplication
between a scalar and a vector.</p>
        <p>hvC−→u = [hvC−→,u1, hvC−→,u2, ...., hvC−→,un], ; hvC−→,ui = αiC−→u × hiv
←− ←− ←− ←− ←− ←− v
hvCu = [hvC,u1, hvC,u2, ...., hvC,un], ; hvC,ui = αiCu × hi
huC−→v = [huC−→,v1, huC−→,v2, ...., huC−→,vn], ; huC−→,vi = αiC−→v × hiu
(5)
(6)
(7)
(9)
(10)
(11)
(12)
(13)
huC←−v = [huC←−,v1, huC←−,v2, ...., huC←−,vn], ; huC←−,vi = αiC←−v × hiu</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.3 Integration and Final Prediction</title>
        <p>The proposed model uses Convolutional Neural Networks
(CNN) [Lecun et al., 1998] for capturing location-invariant
local features from the newly obtained contextualized
hid←− −→ ←− −→
den representations hCv , huCv , hCu , hCu . Four independent
u v v
CNN blocks (CN N1, CN N2, CN N3, CN N4) are used,
corresponding to each of the newly obtained contextualized
hidden representations. Each CN N block consists two
convolutional layers. Both the convolution layer consist of k filters
of height h. The role of these filters is to detect particular
features at different locations of the input. The output cli of
the lth layer consists of kl feature maps of height h. The ith
feature map (cli) is calculated as:</p>
        <p>j=1
cli = bli + X Kil,j ∗ clj−1</p>
        <p>kl−1
yˆ = σ(W p + b),</p>
        <p>W ∈ R4dk; b ∈ R</p>
        <p>The proposed model uses the binary cross-entropy as the
training loss function as shown in equation 22. Here (L) is the
cost function, yˆi ∈ R represents the output of the proposed
model, yi ∈ R represents the true label and N ∈ N represents
the number of training samples.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation Setup</title>
      <sec id="sec-4-1">
        <title>4.1 Dataset</title>
        <p>This paper focuses on detecting sarcasm in the user-generated
short text using only the conversational context. Social media
platforms like Reddit and Twitter are widely used by users for
posting opinions and replying to other’s opinions. They have
proved to be of a great source for extracting conversational
data. So the experiments were conducted on two publicly
available benchmark datasets (Reddit &amp; Twitter) used for the
sarcasm detection task. Both the datasets consist of comments
and reply pairs.</p>
      </sec>
      <sec id="sec-4-2">
        <title>SARC4 Reddit [Khodak et al., 2018] is the largest</title>
        <p>dataset available for sarcasm detection containing millions
of sarcastic/non-sarcastic comments-reply pairs from the
social media site Reddit. This dataset was generated by scraping</p>
        <sec id="sec-4-2-1">
          <title>4https://nlp.cs.princeton.edu/SARC/2.0/</title>
          <p>Training set Reddit IBmalbaanlacendced</p>
          <p>Twitter Balanced
Testing set Reddit IBmalbaanlacendced</p>
          <p>Twitter Balanced
comments from Reddit containing the \s (sarcasm) tag. It
contains replies, their parent comment (acts as context), and a
label that shows whether the reply was sarcastic/non-sarcastic
to their corresponding parent comment. To compare the
performance of the model on a different dataset (latest), the proposed
model was also evaluated on the Twitter dataset provided in the
FigLang5 2020 workshop [Ghosh et al., 2020] for the
"sarcasm detection shared task". This consists of
sarcastic/nonsarcastic tweets and their corresponding contextual parent
tweets. The sarcastic tweets were collected using hashtags
like #sarcasm, #sarcastic, and #irony, similarly non-sarcastic
tweets were collected using hashtags like #happy, #sad, and
#hate. This dataset sometime contains more than one
contextual parent tweet, so in those cases, all of the contextual tweets
are considered independently with the target tweet.</p>
          <p>In both the datasets, replies are the target comment/tweet to
be classified as sarcastic/non-sarcastic, and their
corresponding parent comment/tweet acts as context. Both the datasets
constitute of comments/tweets of varying lengths, but because
this paper only focuses on detecting sarcasm in the short text,
only the short comment/reply pairs were used. Comment/reply
sentences of length (no. of words) less than 20, 40 were used
in the case of SARC and Twitter dataset respectively. In
both cases, the balanced datasets contain equal proportions
of sarcastic/non-sarcastic comment/reply pairs, and the
imbalanced datasets maintain a 20:80 ratio (approximately) between
sarcastic and non-sarcastic comment/reply pairs. Testing was
done on 10% of the dataset and the rest was used for
training. 10% of the training set was used for validation purposes.
Statistics of both the datasets are shown in Table 1.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Data Preprocessing</title>
        <p>The preprocessing of the textual data was done by first
lowercasing all the sentences and separating punctuations from the
words. We do not remove the stop-words because we believe
that sometimes stop-words play a major role in making a
sentence sarcastic e.g., "is it?" and "am I?". The problem with
social media platforms is that, users use a lot of abbreviations,
shortened words and slang words like, "IMO" for "in my
opinion", lmk" for "let me know ", "fr" for "for", etc. These words
are challenging to taken care of in the NLP tasks, particularly
in the automatic discovery of flexible word usages. So to solve
this problem, these words are converted to their corresponding
full-forms using abbreviation/slang word dictionaries obtained
from urban dictionary6. After this, all the sentences were
tokenized into a list of words. The proposed model had a fixed
input size for both comment and reply, but not all the sentences
were of the same length. So all the sentences were padded</p>
        <sec id="sec-4-3-1">
          <title>5sites.google.com/view/figlang2020</title>
          <p>6https://www.urbandictionary.com/
to the length of the longest sentence (20 in the case of the
Reddit dataset and 40 in the case of the Twitter dataset). Word
embeddings are used to give semantically-meaningful dense
representations to the words. Word-based embeddings are
constructed using contextual words whereas character-based
embeddings are constructed from character n-grams of the
words. Character-based in contrast to the Word-based
embeddings solves the problem of out of vocabulary words and
performs better in the case of infrequent words by creating
word embeddings based only on their spellings. So for
generating proper representations for words we have used FastText7,
a character-based word embedding. This would not only give
words better representation compared to the word-based model
but also incorporate slang/shortened/infrequent words (which
commonly appear in social media platforms).
4.3</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>Training Details</title>
        <p>We have used macro-averaged (F1) and accuracy (Acc) scores
as the evaluation metric, as it is standard for the sarcasm
detection task. We have also reported Precision (P) and Recall
(R) scores in the case of the Twitter dataset as well as for the
Reddit dataset (wherever available). Hyperparameter tuning
was used to find optimum values of the hyperparameters. The
FastText embeddings used were of size d = 30 and were
trained for 30 iterations having window size of 3, 5 in the case
of SARC, and Twitter dataset respectively. The number of
filters in all the convolutional blocks were [64, 64] of height
[2, 2]. The learning optimizer used is Adam with an initial
learning rate of 0.01. The value of α in all the LeakyReLu
layers was set to 0.3. All the models were trained for 20
epochs. L2 regularization set to 10−2 is applied to all the
feed-forward connections along with early stopping having
the patience of 5 to avoid overfitting. The mini-batch size
was tuned amongst {100, 500, 1000, 2000, 3000, 4000} and
was observed that mini-batch size of 2000, 500 gave the best
performance for the SARC and Twitter dataset respectively.</p>
        <p>The recent success of transformer-based language models
has led to their wide usage in sentiment analysis tasks. They
are known for generating high quality high dimensional word
representations (768-dimensional for BERT). Their only
drawback is that they require high processing power and memory
to train. The above-mentioned configuration of the proposed
model generates ≈1120K trainable parameters, and increasing
either the embedding size or the number of tokens in a
sentence led to an exponential increase in the number of trainable
parameters. So due to computational resource limitations, we
limited our experiments to lower-dimensional word
embeddings.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>Models Acc FB1alanceP R Acc ImF1balancePd R
CNN-SVM [Poria et al., 2016] †? 68.0 68.0 – – 69.0 79.0 – –
AMR [Ghaeini et al., 2018] ‡ 69.5 69.5 74.8 69.7 – – – –
[Ghosh and Veale, 2017] ‡ – 67.8 68.2 67.9 – – – –
CUE-CNN [Amir et al., 2016] †? 70.0 69.0 – – 73.0 81.0 – –
MHA-BiLSTM [Kumar et al., 2020] † – 77.5 72.6 83.0 – 56.8 60.3 53.7
CASCADE [Hazarika et al., 2018] ‡? 77.0 77.0 – – 79.0 86.0 – –
CASCADE (only discourse features) ‡ 68.0 66.0 – – 68.0 78.0 – –
Bi-ISCA (this paper) ‡ 72.3 75.7 74.2 77.6 71.9 74.4 73.0 75.8
(Δonilnycrdeiassceouwr.sre.tfCeaAtuSrCesA)DE 4.3 ↑ 9.7 ↑ – – 3.9 ↑ 3.6 ↓ – –
† Uses only target sentence, ‡ Uses context along with target sentence,
? Uses personality-based features</p>
      <p>Bi-ISCA focuses on only using the contextual
comment/tweet for detecting sarcasm rather than using any other
topical/personality-based features. Using only the contextual
information enriches the model’s ability to capture syntactical
and semantical textual properties responsible for invoking
sarcasm in any type of conversation. Table 2 reports performance
results on the SARC datasets. For comparison purposes,
F1score (F1), Accuracy score (Acc), Precision (P) and Recall (R)
were used.</p>
      <p>
        When compared with the existing works, Bi-ISCA was able
to outperform all the models (only ‡) that use only
conversational context for sarcasm detection
        <xref ref-type="bibr" rid="ref11 ref13 ref18 ref30 ref9">(Improvement of Δ 7.9%
in F1 score when compared to [Ghosh and Veale, 2017]; Δ
6.2% in F1 score and Δ 2.8% in accuracy when compared to
AMR [Ghaeini et al., 2018])</xref>
        , and was even able to perform
better than the models (†?) that use personality-based features
along with the target sentence for detecting sarcasm
        <xref ref-type="bibr" rid="ref1 ref1 ref10 ref10 ref24 ref24 ref8 ref8">(improvement of Δ 7.7% in F1 and Δ 4.3% in accuracy score when
compared to CNN-SVM [Poria et al., 2016]; Δ 6.7% in F1
score and Δ 2.3% in accuracy when compared to CUE-CNN
[Amir et al., 2016])</xref>
        . MHA-BiLSTM [Kumar et al., 2020]
had a Δ 1.8% higher F1 score in the balanced dataset but
Bi-ISCA was able to show drastic improvement of Δ 17.6%
in the imbalanced dataset, which demonstrated the ability of
Bi-ISCA to handle class imbalance.
      </p>
      <p>The current state-of-the-art on the SARC dataset is achieved
by CASCADE. Even though CASCADE uses
personalitybased features and contextual information along with large
sentences of average length ≈55-62 (very large compared to
our dataset, which gives them the advantage of using a lot
more contextual information), Bi-ISCA was able to achieve
an F1 score comparable to it (despite using relatively short
text). In comparison with CASCADE that only uses
discoursebased features, Bi-ISCA performed drastically better with an
increase of Δ 9.7% in F1 and Δ 4.3% in accuracy score for
the balanced dataset.</p>
      <p>Bi-ISCA clearly demonstrated its capabilities to robustly
handle an imbalance in the dataset, although it was unable to
outperform both the CASCADE models. This slightly poor
performance in the imbalanced dataset can be explained by
the length of sentences used by CASCADE, which are
significantly (≈5 times) greater than the ones on which Bi-ISCA
was tested. Longer sentences result in increased contextual
information which improves performance especially in the
case of imbalance where little extra information can lead to a
drastic increase in performance.</p>
      <p>Models
Baseline (LST Mattn)
BERT-Large+BiLSTM+SVM [Baruah et al., 2020]
BERT+CNN+LSTM [Srivastava et al., 2020]
RoBERTa+LSTM [Kumar and Anand, 2020]
RoBERT-Large [Dong et al., 2020]
RoBERT+Multi-Initialization Ensemble
[Jaiswal, 2020]
BERT + BiLSTM + NeXtVLAD + Context Ensemble
+ Data Augmentation [Lee et al., 2020]
Bi-ISCA (this paper)</p>
      <p>The attention scores generated by the attention mechanism
makes the proposed model highly interpretable. Table 4
showcases the distribution of the attention scores over four sarcastic
(correctly predicted by Bi-ISCA) comment-reply pairs from
the SARC dataset. Not only the proposed model was correctly
able to detect sarcasm in these pairs of sentences but was also
able to correctly identify words responsible for contextual,
explicit, or implicit incongruity which invokes sarcasm.</p>
      <p>For example in Pair 1, Bi-ISCA correctly identified
explicitly incongruous words like "amazing" and "force" in the reply
sentence which were responsible for the sarcastic nature of
the reply. Interestingly the word "traumatized" in the parent
comment also had a high attention weight value, which shows
that the proposed attention mechanism was able to learn the
contextual incongruity between the opposite sentiment words
like "traumatized" &amp; "amazing" in the comment-reply pair.
Pair 2 demonstrates the model’s ability to capture words
responsible for invoking sarcasm by making sentences implicitly
incongruous. Sarcasm due to implicit incongruity is usually
the toughest to perceive. Despite this, Bi-ISCA was able to
give high attention weights to words like "announces" and
"crashes &amp; security holes". Not only this, but the proposed
intra-sentence attention mechanism was also able to learn a
link between "microsoft" and "m" (slang for microsoft)
without having any prior knowledge related to slangs. Pair 3 is
also an example of an explicitly and contextually incongruous
comment-reply pair, where the model was successfully able
to capture opposite sentiment words &amp; phrases like "blind
drunk", "cautious" and "behind the wheel" that made the reply
sarcastic in nature. Pair 4 is an example of sarcasm due to
implicit incongruity between the words, "pause" &amp; "watch",
and contextual incongruity simultaneously between "reported"
&amp; "enjoyable", both of which were successfully captured by
Bi-BISCA.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this paper, we introduce a novel Bi-directional
InterSentence Attention mechanism based model (Bi-ISCA) for
detecting sarcasm. The proposed model not only was able to
capture both intra and inter-sentence dependencies but was
able to achieve state-of-the-art results in detecting sarcasm
in the user-generated short text using only the conversational
context. Further investigation of attention maps illustrated
Bi-ISCA’s ability to capture explicitly, implicitly, and
contextually incongruous words &amp; phrases responsible for invoking
sarcasm. The success of the proposed model is achieved due
to the use of character-based embeddings that takes care of
slang/shortened &amp; out of vocabulary words, Bi-LSTMs that
captures intra-sentence dependencies between words in the
same sentence, and Bi-ISCA that captures inter-sentence
dependencies between words of different sentences.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Amir et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Silvio</given-names>
            <surname>Amir</surname>
          </string-name>
          , Byron C Wallace, Hao Lyu, Paula Carvalho, and Silva Mário J.
          <article-title>Modelling context with user embeddings for sarcasm detection in social media</article-title>
          .
          <source>Proceedings of the Conference on Natural Language Learning (CoNLL)</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Bamman and Smith</source>
          , 2015]
          <string-name>
            <given-names>David</given-names>
            <surname>Bamman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Noah A</given-names>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Contextualized sarcasm detection on twitter</article-title>
          .
          <source>In Ninth International AAAI Conference on Web and Social Media</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Barbieri et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Horacio Saggion, and
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Ronzano</surname>
          </string-name>
          .
          <article-title>Modelling sarcasm in twitter, a novel approach</article-title>
          .
          <source>In Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</source>
          , pages
          <fpage>50</fpage>
          -
          <lpage>58</lpage>
          , Baltimore, Maryland,
          <year>June 2014</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Baruah et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>Arup</given-names>
            <surname>Baruah</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kaushik Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ferdous Barbhuiya</surname>
            , and
            <given-names>Kuntal</given-names>
          </string-name>
          <string-name>
            <surname>Dey</surname>
          </string-name>
          .
          <article-title>Context-aware sarcasm detection using BERT</article-title>
          .
          <source>In Proceedings of the Second Workshop on Figurative Language Processing</source>
          , pages
          <fpage>83</fpage>
          -
          <lpage>87</lpage>
          , Online,
          <year>July 2020</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Cho et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Kyunghyun</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bart van Merriënboer,
          <string-name>
            <surname>Caglar Gulcehre</surname>
            , Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and
            <given-names>Yoshua</given-names>
          </string-name>
          <string-name>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Learning phrase representations using RNN encoder-decoder for statistical machine translation</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>1724</fpage>
          -
          <lpage>1734</lpage>
          , Doha, Qatar,
          <year>October 2014</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Dong et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>Xiangjue</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Changmao</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jinho D.</given-names>
            <surname>Choi</surname>
          </string-name>
          .
          <article-title>Transformer-based context-aware sarcasm detection in conversation threads from social media</article-title>
          .
          <source>In Proceedings of the Second Workshop on Figurative Language Processing</source>
          , pages
          <fpage>276</fpage>
          -
          <lpage>280</lpage>
          , Online,
          <year>July 2020</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Eisterhold et al.,
          <year>2006</year>
          ]
          <string-name>
            <given-names>Jodi</given-names>
            <surname>Eisterhold</surname>
          </string-name>
          , Salvatore Attardo, and
          <string-name>
            <given-names>Diana</given-names>
            <surname>Boxer</surname>
          </string-name>
          .
          <article-title>Reactions to irony in discourse: evidence for the least disruption principle</article-title>
          .
          <source>Journal of Pragmatics</source>
          ,
          <volume>38</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1239</fpage>
          -
          <lpage>1256</lpage>
          ,
          <year>2006</year>
          .
          <article-title>Focus-on Issue: Discourse and Conversation</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Farías et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Delia</given-names>
            <surname>Irazú Hernaundefineddez Farías</surname>
          </string-name>
          , Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <article-title>Irony detection in twitter: The role of affective content</article-title>
          .
          <source>ACM Trans. Internet Technol</source>
          .,
          <volume>16</volume>
          (
          <issue>3</issue>
          ),
          <year>July 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Ghaeini et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Reza</given-names>
            <surname>Ghaeini</surname>
          </string-name>
          , Xiaoli
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fern</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Prasad</given-names>
            <surname>Tadepalli</surname>
          </string-name>
          .
          <article-title>Attentional multi-reading sarcasm detection</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1809</year>
          .03051,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Ghosh and Veale</source>
          , 2016]
          <string-name>
            <given-names>Aniruddha</given-names>
            <surname>Ghosh</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tony</given-names>
            <surname>Veale</surname>
          </string-name>
          .
          <article-title>Fracking sarcasm using neural network</article-title>
          .
          <source>In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</source>
          , pages
          <fpage>161</fpage>
          -
          <lpage>169</lpage>
          , San Diego, California, June 2016.
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Ghosh and Veale</source>
          , 2017]
          <string-name>
            <given-names>Aniruddha</given-names>
            <surname>Ghosh</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tony</given-names>
            <surname>Veale</surname>
          </string-name>
          .
          <article-title>Magnets for sarcasm: Making sarcasm detection timely, contextual and very personal</article-title>
          .
          <source>In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>482</fpage>
          -
          <lpage>491</lpage>
          , Copenhagen, Denmark,
          <year>September 2017</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Ghosh et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>Debanjan</given-names>
            <surname>Ghosh</surname>
          </string-name>
          , Avijit Vajpayee, and
          <string-name>
            <given-names>Smaranda</given-names>
            <surname>Muresan</surname>
          </string-name>
          .
          <article-title>A report on the 2020 sarcasm detection shared task</article-title>
          .
          <source>In Proceedings of the Second Workshop on Figurative Language Processing</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          , Online,
          <year>July 2020</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Hazarika et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Devamanyu</given-names>
            <surname>Hazarika</surname>
          </string-name>
          , Soujanya Poria, Sruthi Gorantla, Erik Cambria, Roger Zimmermann, and
          <article-title>Rada Mihalcea. CASCADE: Contextual sarcasm detection in online discussion forums</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <fpage>1837</fpage>
          -
          <lpage>1848</lpage>
          ,
          <string-name>
            <given-names>Santa</given-names>
            <surname>Fe</surname>
          </string-name>
          , New Mexico, USA,
          <year>August 2018</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Hochreiter and Schmidhuber</source>
          , 1997]
          <article-title>Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Jaiswal</source>
          , 2020]
          <string-name>
            <given-names>Nikhil</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          .
          <article-title>Neural sarcasm detection using conversation context</article-title>
          .
          <source>In Proceedings of the Second Workshop on Figurative Language Processing</source>
          , pages
          <fpage>77</fpage>
          -
          <lpage>82</lpage>
          , Online,
          <year>July 2020</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Joshi et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Joshi</surname>
          </string-name>
          , Vinita Sharma, and
          <string-name>
            <given-names>Pushpak</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          .
          <article-title>Harnessing context incongruity for sarcasm detection</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)</source>
          , pages
          <fpage>757</fpage>
          -
          <lpage>762</lpage>
          , Beijing, China,
          <year>July 2015</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Khattri et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Anupam</given-names>
            <surname>Khattri</surname>
          </string-name>
          , Aditya Joshi, Pushpak Bhattacharyya, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Carman</surname>
          </string-name>
          .
          <article-title>Your sentiment precedes you: Using an author's historical tweets to predict sarcasm</article-title>
          .
          <source>In Proceedings of the 6th workshop on computational approaches to subjectivity, sentiment and social media analysis</source>
          , pages
          <fpage>25</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Khodak et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Mikhail</given-names>
            <surname>Khodak</surname>
          </string-name>
          , Nikunj Saunshi, and
          <string-name>
            <given-names>Kiran</given-names>
            <surname>Vodrahalli</surname>
          </string-name>
          .
          <article-title>A large self-annotated corpus for sarcasm</article-title>
          .
          <source>In Proceedings of the Linguistic Resource and Evaluation Conference (LREC)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[Kumar and Anand</source>
          , 2020]
          <string-name>
            <given-names>Amardeep</given-names>
            <surname>Kumar</surname>
          </string-name>
          and
          <string-name>
            <given-names>Vivek</given-names>
            <surname>Anand</surname>
          </string-name>
          .
          <article-title>Transformers on sarcasm detection with context</article-title>
          .
          <source>In Proceedings of the Second Workshop on Figurative Language Processing</source>
          , pages
          <fpage>88</fpage>
          -
          <lpage>92</lpage>
          , Online,
          <year>July 2020</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Kumar et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. T.</given-names>
            <surname>Narapareddy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Aditya Srikanth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Malapati</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L. B. M.</given-names>
            <surname>Neti</surname>
          </string-name>
          .
          <article-title>Sarcasm detection using multi-head attention based bidirectional lstm</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>8</volume>
          :
          <fpage>6388</fpage>
          -
          <lpage>6397</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Lecun et al.,
          <year>1998</year>
          ]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lecun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Haffner</surname>
          </string-name>
          .
          <article-title>Gradient-based learning applied to document recognition</article-title>
          .
          <source>Proceedings of the IEEE</source>
          ,
          <volume>86</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>[Lee</surname>
          </string-name>
          et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>Hankyol</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Youngjae</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Gunhee</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <article-title>Augmenting data for sarcasm detection with unlabeled conversation context</article-title>
          .
          <source>In Proceedings of the Second Workshop on Figurative Language Processing</source>
          , pages
          <fpage>12</fpage>
          -
          <lpage>17</lpage>
          , Online,
          <year>July 2020</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Liebrecht et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Christine</given-names>
            <surname>Liebrecht</surname>
          </string-name>
          , Florian Kunneman, and Antal van den Bosch.
          <article-title>The perfect solution for detecting sarcasm in tweets #not</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</source>
          , pages
          <fpage>29</fpage>
          -
          <lpage>37</lpage>
          , Atlanta, Georgia,
          <year>June 2013</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [Poria et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Soujanya</given-names>
            <surname>Poria</surname>
          </string-name>
          , Erik Cambria, Devamanyu Hazarika, and
          <string-name>
            <given-names>Prateek</given-names>
            <surname>Vij</surname>
          </string-name>
          .
          <article-title>A deeper look into sarcastic tweets using deep convolutional neural networks</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics: Technical Papers</source>
          , pages
          <fpage>1601</fpage>
          -
          <lpage>1612</lpage>
          , Osaka, Japan,
          <year>December 2016</year>
          .
          <article-title>The COLING 2016 Organizing Committee</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [Rajadesingan et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Ashwin</given-names>
            <surname>Rajadesingan</surname>
          </string-name>
          , Reza Zafarani, and Huan Liu.
          <article-title>Sarcasm detection on twitter: A behavioral modeling approach</article-title>
          .
          <source>In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM '15, page 97-106</source>
          , New York, NY, USA,
          <year>2015</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [Reyes et al.,
          <year>2013</year>
          ] Antonio Reyes, Paolo Rosso, and
          <string-name>
            <given-names>Tony</given-names>
            <surname>Veale</surname>
          </string-name>
          .
          <article-title>A multidimensional approach for detecting irony in twitter</article-title>
          .
          <source>Language resources and evaluation</source>
          ,
          <volume>47</volume>
          (
          <issue>1</issue>
          ):
          <fpage>239</fpage>
          -
          <lpage>268</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [Riloff et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Ellen</given-names>
            <surname>Riloff</surname>
          </string-name>
          , Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and
          <string-name>
            <given-names>Ruihong</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Sarcasm as contrast between a positive sentiment and negative situation</article-title>
          .
          <source>In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2013</year>
          ,
          <volume>18</volume>
          -21
          <source>October</source>
          <year>2013</year>
          , Grand Hyatt Seattle, Seattle, Washington, USA,
          <article-title>A meeting of SIGDAT, a Special Interest Group of the ACL</article-title>
          , pages
          <fpage>704</fpage>
          -
          <lpage>714</lpage>
          . ACL,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <source>[Schuster and Paliwal</source>
          , 1997]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schuster</surname>
          </string-name>
          and
          <string-name>
            <surname>K. K. Paliwal.</surname>
          </string-name>
          <article-title>Bidirectional recurrent neural networks</article-title>
          .
          <source>IEEE Transactions on Signal Processing</source>
          ,
          <volume>45</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2673</fpage>
          -
          <lpage>2681</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [Srivastava et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>Himani</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , Vaibhav Varshney, Surabhi Kumari, and
          <string-name>
            <given-names>Saurabh</given-names>
            <surname>Srivastava</surname>
          </string-name>
          .
          <article-title>A novel hierarchical BERT architecture for sarcasm detection</article-title>
          .
          <source>In Proceedings of the Second Workshop on Figurative Language Processing</source>
          , pages
          <fpage>93</fpage>
          -
          <lpage>97</lpage>
          , Online,
          <year>July 2020</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [Tay et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Yi</given-names>
            <surname>Tay</surname>
          </string-name>
          , Anh Tuan Luu, Siu Cheung Hui, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Su</surname>
          </string-name>
          .
          <article-title>Reasoning with sarcasm by reading in-between</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>1010</fpage>
          -
          <lpage>1020</lpage>
          , Melbourne, Australia,
          <year>July 2018</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [Tsur et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Oren</given-names>
            <surname>Tsur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Dmitry</given-names>
            <surname>Davidov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ari</given-names>
            <surname>Rappoport</surname>
          </string-name>
          .
          <article-title>Icwsm-a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews</article-title>
          .
          <source>In fourth international AAAI conference on weblogs and social media</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [Vaswani et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <article-title>Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need</article-title>
          . In I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , and R. Garnett, editors,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pages
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          . Curran Associates, Inc.,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [Wilson, 2006]
          <string-name>
            <given-names>Deirdre</given-names>
            <surname>Wilson</surname>
          </string-name>
          .
          <article-title>The pragmatics of verbal irony: Echo or pretence?</article-title>
          <source>Lingua</source>
          ,
          <volume>116</volume>
          (
          <issue>10</issue>
          ):
          <fpage>1722</fpage>
          -
          <lpage>1743</lpage>
          ,
          <year>2006</year>
          .
          <article-title>Language in Mind: A Tribute to Neil Smith on the Occasion of his Retirement</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>