<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bidirectional Dilated LSTM with Attention for Fine-grained Emotion Classi cation in Tweets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Annika M Schoene[</string-name>
          <email>amschoene@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander P Turn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The University of Hull</institution>
          ,
          <addr-line>Cottingham Road, Hull HU6 7RX</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose a novel approach for ne-grained emotion classication in tweets using a Bidirectional Dilated LSTM (BiDLSTM) with attention. Conventional LSTM architectures can face problems when classifying long sequences, which is problematic for tweets, where crucial information is often attached to the end of a sequence, e.g. an emoticon. We show that by adding a bidirectional layer, dilations and attention mechanism to a standard LSTM, our model overcomes these problems and is able to maintain complex data dependencies over time. We present experiments with two datasets, the 2018 WASSA Implicit Emotions Shared Task and a new dataset of 240,000 tweets. Our BiDLSTM with attention achieves a test accuracy of up to 81:97% outperforming competitive baselines by up to 10.52% on both datasets. Finally, we evaluate our data against a human benchmark on the same task.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural Language Processing Sentiment Analysis Recurrent Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        There has been a surge of interest in the eld of sentiment analysis in recent
years, which is likely due to the growing number of social media users, who
increasingly express their opinions, beliefs and attitudes in online posts towards
a range of di erent topics, events and products [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]. Most sentiment analysis
approaches to date focus on polarity detection [
        <xref ref-type="bibr" rid="ref17 ref3">17, 3</xref>
        ] but neglect the classi cation
of more ne-grained emotion categories, such as Ekman's basic six emotions
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Fine-grained emotion detection has promising applicability in a number of
domains, including detecting cyber-bullying [
        <xref ref-type="bibr" rid="ref55">55</xref>
        ] or identifying potential mental
health issues in social media posts [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ].
      </p>
      <p>
        The majority of current approaches to sentiment analysis rely on deep
learning algorithms [
        <xref ref-type="bibr" rid="ref59">59</xref>
        ], such as recurrent neural networks (RNN) [
        <xref ref-type="bibr" rid="ref25 ref47">47, 25</xref>
        ] and
convolutional neural networks (CNN) [
        <xref ref-type="bibr" rid="ref11 ref12">12, 11</xref>
        ]. While tweets have previously been
categorised as short sequences or sentence-level sentiment analysis [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], we
argue that this should no longer be the case especially since Twitter increased
its allowed character limit from 140 to 280 [
        <xref ref-type="bibr" rid="ref49">49</xref>
        ]. As such, tweets mostly face
also problems with classifying long sequences, similar to other natural language
processing tasks [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        In this paper we propose the use of Dilated RNNs (DRNN) for emotion
classi cation from tweets. DRNNs introduce skip connections into a standard RNN
to increase the range of temporal dependencies that can be modelled.
Experiments on sequence classi cation for language modelling on the Penn Treebank,
pixel-by-pixel MNIST classi cation and speaker identi cation from audio [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
have shown to outperform competitive baselines such as standard LSTM/GRU
architectures as well as more specialised models. We expect that the same
advantages can be observed for tweets. We extend the standard proposed DRNN
with an embedding layer, bidirectional layer and attention mechanism and
apply it to the classi cation of six basic emotion categories, anger, fear, disgust,
surprise, joy and sadness. Figure 1 shows an example of a tweet.
      </p>
      <p>Therefore we hypothesise that by using dilated recurrent neural networks
we can take advantage of the increased sequence length of tweets and avoid
information loss over time. Another reason for the good performance of dilated
recurrent skip connections is that they have a better balance of memory over
a larger period of time compared to standard RNNs. We believe that using a
similar structure, albeit not for a very long sequence but treating tweets as longer
sequence will enable us to achieve better classi cation accuracies compared to
treating tweets as a short sequence problem.</p>
      <p>We experiment with two datasets, the 2018 WASSA Implicit Emotions Shared
Task dataset which contains 153,383 tweets and can be considered an established
benchmark. In addition, we collected a new larger dataset of 240,000 tweets
using the same six emotion categories. We nd that on both datasets, DLSTMs
with attention perform better than standard LSTM or CNN architectures, as
well as any of the submissions to the WASSA shared task, achieving up to
71.45% of accuracy. We nd that the BiDLSTMs with attention are particularly
bene cial for the longest sequences in our datasets and that the additions of a
word embeddings, bidirectional layer and attention mechanism further increase
performance.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Recently, deep learning methods for sentiment and emotion classi cation have
become the predominant technique. For example, [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] developed a soft
attentionbased LSTM with CNN for sarcasm detection. Work conducted by [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] use a
deep CNN with a multi-kernel classi er to extract features of short sequences
for multi-modal sentiment analysis and show that this increases accuracy. [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ] use
a BiLSTM for a range of di erent text classi cation tasks, including sentiment
analysis. In their experiments they show that using a single-layer BiLSTM with
pretrained word embeddings and trained with cross-entropy loss achieves
competitive results compared to more complex learning models. Most recently the
Implicit Emotions Shared Task (IEST) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] has used Tweets, where the winning
model, named 'Amobee', was able to outperform the baseline score signi cantly
by achieving an accuracy of 71.45% [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ]. Amobee is a bidirectional GRU with
an additional attention mechanism inspired by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and additional hidden layers.
It has been reasoned that the model's success has been due to its speci c type of
transfer learning. The baseline model for this shared task was established using
a maxentropy classi er with L2 regularization, where the F1 score reached an
accuracy of 59.1% on the test data. Recurrent neural networks have become the
predominant neural network across as range of sentiment analysis and emotion
detection tasks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Similarly, almost half of the submissions to the annual
SemEval shared task [
        <xref ref-type="bibr" rid="ref27 ref29 ref39">39, 27, 29</xref>
        ] used some form of neural networks. At the same
time, the majority of approaches to detect sentiment continue to focus on
polarity detection [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], including approaches to identifying sentiment on social media
such as Twitter [
        <xref ref-type="bibr" rid="ref30 ref39">39, 30</xref>
        ] or longer texts such as reviews or blogs [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. This is
limiting for real-world applications, where for mental state detection, customer
reviews, advertising, and many more, ne-grained emotions can add substantial
added value.
      </p>
      <p>
        Approaches that have attempted more ne-grained classi cation are mostly
based on Ekman's six basic emotions [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], anger, fear, disgust, surpise, joy and
sadness, or Plutchik's eight basic emotions [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ], who extended [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] basic
emotions with Trust and Anticipation. For example, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] apply Gated Recurrent
Neural Networks (GRNNs) to classify tweets collected based on hashtags carrying
emotions into [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] emotion categories. Research conducted by [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] used hashtags
that contain emotion words based on Plutschnik's eight basic emotions to show
that user-labelled hashtags used as annotations are consistent with those
annotated by trained judges. Furthermore a new lexicon based on the same twitter
corpus is introduced. [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] introduces a Topic Sentiment Model (TSM), which can
capture both topics and sentiment. The model is based on Probabilistic Latent
Semantic Indexing (pLSI) and utilises an online sentiment retrieval service to
induce prior knowledge to the model. Research by [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ] use distant supervision
and a lexicon to label tweets for Plutschik's eight basic emotions [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] and then
classify them. Work conducted by [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ] also investigated eight basic emotions in
online discourse. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] used the whole taxonomy of Plutschik's emotions to analyse
chat messages.
      </p>
      <p>
        Work on sentiment classi cation from social media has additionally explored
the occurrence of emoticons and their in uence on sentiment classi cation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
[
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] conducted research distinguishing happiness and sadness in emoticons.
Similarly, [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] have shown that the usage of both hashtags and emoticons can be
bene cial and contribute to more accurate classi cation of tweets.
Motivation There are a number of challenges that have to be taken into
account when using recurrent neural networks to learn longer sequences, which
include but are not limited to: (1) maintaining mid- and short term memory
is problematic when memorising long-term dependencies [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and (2) vanishing
and exploding gradient descent [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. Therefore it could be argued that there is
a need for a more specialised learning model which can overcomes these
challenges. [
        <xref ref-type="bibr" rid="ref52">52</xref>
        ] introduce a dilated LSTM as part of a reinforcement learning task,
where the learning model has one dilated recurrent layer with xed dilations.
Work by [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] introduced a Dilated RNN by using dilated skip connections. The
dilated LSTM alleviates the problem of learning long sequences, however not
every word in a sequence has the same meaning or importance. Therefore we
extend this network by (1) an embedding layer, (2) a bidirectional layer and (3)
attention mechanism. The full architecture of the Bidirectional Dilated LSTM
(BiDLSTM) with attention is shown in Figure 2.
LSTM architecture Our primary model is the Long-short-term memory (LSTM)
given its suitability for language and time-series data [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. We feed into the LSTM
an input sequence x = (x1; : : : ; xN ) of words in a tweet alongside a label y 2 Y
denoting an emotion from any of the six basic emotion categories. The LSTM
learns to map inputs x to an output y via a hidden representation ht which can
be found recursively from an activation function:
f (ht 1; xt);
(1)
where t denotes a time-step. During training, we minimise a loss function, in our
case categorical cross-entropy, as:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
it = (Wxixt + Whiht 1 + Wcict 1 + bi)
ft = (Wxf xt + Whf ht 1 + Wcf ct 1 + bf )
ct = ftct 1 + it tanh(Wxcxt + Whcht 1 + bc)
ot = (Wxoxt + Whoht 1 + Wcoct + bo)
      </p>
      <p>ht = ottanh(ct)</p>
      <p>
        A standard LSTM de nition solves some of the problems of vanilla RNNs
have, such as the vanishing gradient descent problem [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], but it still has some
shortcomings when learning long-term dependencies. One of them is due to the
cell state of an LSTM; the cell state is changed by adding some function of
the inputs. When we backpropagate and take the derivative of ct with respect to
ct 1, the added term would disappear and less information would travel through
the layers of a learning model. This shortcoming can be addressed through the
use of dilations and skip-connections in the dilated LSTM.
      </p>
      <p>
        Embedding and bidirectional Layer Each tweet t contains wi words where wit; t 2
[0; T ] represents the ith word in each tweet. We utilise GloVe word embeddings
trained on 2 billion tweets as developed by [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ], in our 200-dimensional
embedding layer. Then we use a bidirectional LSTM to obtain information from both
directions of each word in order to capture the contextual information. The
bidirectional LSTM incorporates the forward LSTM !h t(i) which reads each tweet
from wi1 to wiT and a backward LSTM h t(i) which reads words in each tweet
from wiT to wi, where xit represents word vectors in an embedding matrix:
L(x; y) =
      </p>
      <p>N n2N
1 X xn log yn:</p>
      <p>
        Standard LSTMs manage their weight updates through a number of gates
that determine the amount of information that should be retained and forgotten
at each time step. In particular, we distinguish an `input gate' i that decides how
much new information to add at each time-step, a `forget gate' f that decides
what information not to retain and an `output gate' o determining the output.
More formally, and following the de nition by [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], this leads us to update our
hidden state h as follows (where refers to the logistic sigmoid function, c is
the `cell state', W is the weight matrix and b is the bias term):
      </p>
      <p>xit = Wewit; t 2 [1; T ]
!h t(i) = LST M!(xit); t 2 [1; T ]
h t(i) = LST M (xit); t 2 [1; T ]</p>
      <p>We then concatenate all outputs of the forward hidden state !h t and
backward hidden state h t , where the output o allows us to utilise all information
available in each tweet. The output o is then fed into the Dilated LSTM.</p>
      <p>
        Dilated LSTM Layer For our implementation of a Dilated LSTM, we follow
the implementation of recurrent skip connections with exponentially increasing
dilations in a multi-layered learning model - as proposed by [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] - as it allows
LSTMs to better learn input sequences and their dependencies. This means that
temporal and complex data dependencies are learned on di erent layers. The
most important part of this architecture is the dilated recurrent skip connection
in the LSTM cell, where ct(l) is the cell in layer l at time t:
c(l) = LST M (ot(l); ct(l)sl )
t
s(l) is the skip length of layer l; ot(l) as the input to layer l at time t in a LSTM.
The exponentially increasing dilations across layers have been inspired by [
        <xref ref-type="bibr" rid="ref51">51</xref>
        ];
s(l) denotes the dilation of the l-th layer, where M and L denotes dilations at
di erent layers:
      </p>
      <p>
        s(l) = M (l 1); l = 1; : : : L:
As outlined by [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] there are two main bene ts to stacking exponentially dilated
recurrent layers: (1) it enables di erent layers to focus on di erent temporal
resolutions and (2) it reduces the length of paths between nodes at di erent
timesteps, which enables the network to learn more complex long-term dependencies.
Therefore exponentially increasing dilations shortens any given sequence length
at di erent layers.
      </p>
      <p>
        Attention Layer The attention mechanism was rst introduced by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], but has
since been used in a number of di erent tasks including machine translation [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ],
sentence pairs detection [
        <xref ref-type="bibr" rid="ref58">58</xref>
        ], neural image captioning [
        <xref ref-type="bibr" rid="ref56">56</xref>
        ] and action recognition
[
        <xref ref-type="bibr" rid="ref45">45</xref>
        ].
      </p>
      <p>
        Our implementation of the attention mechanism is inspired by [
        <xref ref-type="bibr" rid="ref57">57</xref>
        ], using
attention to nd words that are most important to the meaning of a tweet.
We use the output of the dilated LSTM as direct input into the attention layer,
where O denotes the output of nal layer L of the Dilated LSTM at time t+1. The
attention for each word w in a tweet t is computed as follows, where hiw is the
hidden representation of the dilated LSTM output, iw represents normalised
alpha weights measuring the importance of each word and ti is the corresponding
tweet vector:
uiw = tanh(O + bw)
iw =
      </p>
      <p>exp (hiTw)</p>
      <p>Pt exp (hiTw)
ti = X
(12)
(13)
(14)
(15)</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>We present the datasets used, our baselines and discuss objective and subjective
results.
4.1</p>
      <sec id="sec-3-1">
        <title>Data</title>
        <p>
          We will work with the following datasets:
{ The WASSA Implicit Emotions Shared Task (IEST) [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] data consists of
155,383 tweets and is based on [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] six basic emotions.
{ The Ekman's Emotion Keyword (EEK) data, a collection of 240,000 tweets
that we collected between September 2017 and December 2018. 1
        </p>
        <p>
          Both datasets were collected using the Twitter API [
          <xref ref-type="bibr" rid="ref50">50</xref>
          ] and a list of keyword
and synonyms were speci ed for automatic data collection from Twitter. See
Table 2 for the keywords that we used, following [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] and using Ekman's six basic
emotions. After the initial data collection we ltered tweets by those marked in
the language tab as "English" and removed any duplicates. Then we used the text
processing library developed by [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], to anonymise usernames and mask URLs.
Afterwards we used a dictionary containing all emotion keywords listed in Table
2 and replaced existing keywords in all tweets with the term [keyword]. Finally
each tweet was assigned a label based on the emotion category its keyword
belonged to (see Figure 1). For our experiments we use 80% of the data for
training, 10% for validation and the remaining 10% for testing.
1 The dataset will be released to the research community upon request and in
accordance with the Twitter API guidelines [
          <xref ref-type="bibr" rid="ref50">50</xref>
          ]
4.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Baselines</title>
        <p>
          Similarly to [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] we use a a maximum entropy classi er with L2 regularisation
for establishing the baselines of our datasets. All baselines will be evaluated in
two conditions:
Capped length , where we cap the length of any sequence to 40 in accordance
with the WASSA IEST challenge winners.
        </p>
        <p>Full length ,where we use the average full uncapped length of a sequence
(maximum 103). Our intuition is that this condition will particularly reveal the
advantages of the skip connections.</p>
        <p>
          For the DLSTM, BiDLSTM and BiDLSTM with attention, we established the
number of dilations empirically. There are two dilated layers with the dilations
increasing exponentially starting at 1 [
          <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
          ]. This means that each sub-LSTM for
the pruned sequence has the following sequence length [Dilation 1 = 40, Dilation
2 = 20] with a total of 20 hidden units per layer. Whilst each sub-LSTM for the
longer sequence has the following sequence length:[Dilation 1 = 102, Dilation 2
= 51].
        </p>
        <p>
          We evaluate our BiDLSTM with attention against the following baselines:
{ DLSTM { a dilated LSTM with hierarchically stacked dilations and
hyperparameters: learning rate: 0.001, batch size: 128, optimizer: Adam, dropout:
0.5
{ BiDLSTM { a two-layer bidirectional dilated LSTM with a three-layer
LSTM, hierarchically stacked dilations and the same hyperparameters as
the DLSTM.
{ BiLSTM { a BiLSTM with 2 layers and the following hyper-parameters:
learning rate: 0.001, batch size: 128, optimizer: Adam, dropout: 0.5. This
model is similar to recent work by [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ] who used a single layer biLSTM to
classify the ImdB movie review dataset into positive and negative reviews.
{ BiLSTM with attention { a BiLSTM with attention and the following
hyper-parameters: learning rate: 0.001, batch size: 128, optimizer: Adam,
dropout: 0.5. This model is similar to recent work by [
          <xref ref-type="bibr" rid="ref43 ref7">7, 43</xref>
          ].
{ CNN { a CNN 2-D convolution with two fully connected layers, a lter size
of 1,2 and 102 lters, and a ReLU function. This learning model is similar
to recent work by [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
{ CNN-LSTM { we follow the implementation of the learning model by [
          <xref ref-type="bibr" rid="ref53">53</xref>
          ],
using a CNN that is feeding into an LSTM. This model was used to predict
the valence/arousal of ratings in textual data.
        </p>
        <p>
          Also, we compare our model against the winner of the 2019 WASSA IEST
dataset, called Amobee[
          <xref ref-type="bibr" rid="ref41">41</xref>
          ]. All of the experiments conducted using Tensor ow
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>We benchmark the BiDLSTM with attention to a number of di erent neural
networks, using both vanilla neural networks and more specialised neural
networks that have been used in sentiment analysis tasks. We compare results by
two di erent sequence length and use four di erent metrics for evaluation; test
set accuracy, precision, recall and F1-score.</p>
      <p>Capped Sequences Tables 3 and 4 show the results for capped sequences lengths
for both the IEST and EEK dataset respectively.</p>
      <p>Learning Model Test Acc. Precision Recall F1-score
Max Entropy 58.4 0.59 0.57 0.58
CNN 43.17 0.44 0.42 0.43
CNN LSTM 55.42 0.56 0.54 0.55
BI LSTM 49.47 0.50 0.48 0.49
BI LSTM attention 58.60 0.60 0.56 0.58
DLSTM 56.44 0.57 0.55 0.56
BiDLSTM 67.96 0.68 0.67 0.67
Amobee - - - 71.45
BiDLSTM attention 72.83 0.74 0.71 0.72</p>
      <p>It can be seen that vanilla CNN and BiLSTM fall just short of the baselines
established for this task. The CNN-LSTM and DLSTM architecture, both
outperform their vanilla predecessors. The BiLSTM with attention and BiDLSTM
surpass the baselines but falls short of the model proposed in the IEST task for
both datasets. It can be seen that BiDLSTM with attention outperforms all
previous models on the capped sequence length by over 14.43% for capped sequences
and the IEST baseline by 11.24%. The results for capped sequence length using
the IEST dataset (Table 3) show that our proposed model surpasses the 'Amobee'
model's result, however this is only marginally. We hypothesis that the reason
the DLSTM, BiDLSTM and BiDLSTM and with attention either fall short of
the baselines or only marginally surpass them is due the model not being able
to take full advantage of the full sequence length.</p>
      <p>Long sequences Table 5 shows the results for the IEST dataset using full length
sequences and Table 6 also shows the results for the full length for the EEK
dataset. Similarly to the results for the capped sequence length, the CNN and
Bi-LSTM fall short of the established baselines. Only the CNN-LSTM improves
the performance of the results, whereas for the long sequences the DLSTM,
BiLSTM with attention and BiDSLTM surpasses the baselines of both datasets. The
BiDLSTM with attention outperforms all models on the full length sequences
by over 20.36% on the EEK dataset and the IEST baseline by 18.47%. These
results show that incorporating contextual information through the bidirectional
layer and using attention to focus on the most important words in a tweet
enhances the dilated LSTMs ability to cope with longer sequences. This con rms
that using more specialised learning models such as the DLSTM, BiDLSTM
and BiDLSTM with attention allow us to better capture information in longer
sequences.</p>
      <p>Learning Model Test Acc. Precision Recall F1-score
Max Entropy 58.4 0.59 0.57 0.58
CNN 43.95 0.44 0.43 0.43
CNN LSTM 56.15 0.57 0.55 0.56
BI LSTM 51.73 0.52 0.51 0.51
BI LSTM attention 58.79 0.59 0.58 0.58
DLSTM 60.27 0.61 0.59 0.60
BiDLSTM 69.01 0.71 0.67 0.69
BiDLSTM attention 78.76 0.79 0.78 0.78</p>
      <sec id="sec-4-1">
        <title>Evaluation of Prediction Labels</title>
        <p>
          In order to evaluate the performance of each model, we have set aside 5,000
tweets per dataset that have not been used during training or testing previously.
We then use the pretrained models to establish, which labels are hardest to
predict for each network. We compare the best performing learning model with
human performance. For this we used Amazon Mechanical Turk [
          <xref ref-type="bibr" rid="ref48">48</xref>
          ], where each
tweet was annotated by three di erent annotators for the six emotion categories,
yielding 15,000 annotations per dataset. All emotion words were replaced with
the term '[Keyword]', a sample tweet can be seen in Figure 3.
        </p>
        <p>We use confusion matrices to visualise the quality of label output for our
learning model on both datasets. Figures 4 and 5 both show the confusion
matrices for the BiDLSTM with attention. Figures 4 and 5 shows that for the both
datasets Joy was most accurately predicted emotion, whilst Anger (61.96 %) was
often misclassi ed. Furthermore it is shows that Anger is more often confused
with Disgust in both datasets.</p>
        <p>Furthermore we have also looked at each emotion in both datasets in order
to gain a better insight into how well each emotion is classi ed by the proposed
learning model. We use Precision, Recall and F-1 score as our evaluation metrics
for both of the test datasets. Table 7 shows the emotion labels in the IEST
dataset using the full sequence length, where the best performing emotion is
Joy and the emotion Anger is most often misclass ed. Table 8 also shows the
label classi cation for the EEK dataset using the full sequence length, con rming
that the same emotions, Joy and Anger, are also the most and least likely to be
accurately classi ed.</p>
        <p>Type Precision Recall F1-score
Anger 0.69 0.76 0.72
Fear 0.69 0.83 0.75
Disgust 0.83 0.75 0.79
Sadness 0.76 0.78 0.77
Joy 0.90 0.75 0.82
Surprise 0.84 0.78 0.81</p>
        <p>Average 0.79 0.78 0.78</p>
        <p>Afterwards we looked at the results for the human annotation, for the same
test datasets. Figures 6 and 7 show the confusion matrices for the human
annotators. Each confusion matrix shows the number of correctly and false predicted
labels in percentages. We have found that for both datasets evaluated by humans
that the most commonly correctly annotated emotion was Joy with 37.70% in
the IEST and 41.80% in the EEK dataset. The emotion Disgust was least likely
to be accurately annotated in both datasets. Furthermore Disgust was most
often mistaken for the emotion Sadness in both datasets and overall there were
far fewer accurately predicted labels by the human annotators compared to the
proposed learning model.</p>
        <p>In Figure 8 we show an example of a tweet with its true label and the labels
predicted by human annotators. It can be seen that for all three people
annotating this tweet there was no agreement on the emotion label and no annotator
picked the correct label. This illustrate how hard this task may be for humans
as the keyword could have been replaced with a number of di erent emotion
keywords and made sense.</p>
        <p>Probabilities of labels Furthermore we have looked at 100 random test samples
to see the probability distribution of the output labels (see Figures 9 and 10).
It could be argued that there might be some larger pattern that is detected by
learning models when humans write about emotion that may not be detected by
humans on a qualitative basis.</p>
        <p>
          This might be due to the di culty in the task where many emotions are
closely related or overlapping such as Disgust and Anger, where humans were
not able to interpret them correctly [
          <xref ref-type="bibr" rid="ref54">54</xref>
          ]. Other studies have previously found
that humans struggle to identify emotions in textual data due to the lack of extra
information provided (e.g.: tone of voice or facial expression) and therefore often
projecting their own emotional state and information [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ]. However, this is not
possible for any learning model and therefore might be the reason why they are
better at detecting underlying patterns in this type of data.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper we have found that our learning model, the bidirectional dilated
LSTM with attention, performs above the baseline of 58.4% by over 14.43% on
the WASSA shared task dataset. Furthermore, our model performs also best on
our own dataset achieving an accuracy of 80.97%. We have also found that when
using longer sequences we achieve better results with models that are more
specialised compared to vanilla neural networks. Additionally, we have shown
that when pruning our model to use a shorter input sequence it still outperforms
state-of-the art results. Also, it could be argued that treating tweets as longer
sequences we can utilise more information in a tweet. Furthermore we have
evaluated which labels are most likely predicted correctly by both humans and
the BiDLSTM with attention. We have demonstrated that the task of accurately
identifying the six emotion categories in tweets is considerably harder for humans
compared to the learning model. This could largely be due to the amount of
emotions projected by humans on an individual tweet which doesn't enable them
to identi ed overall patterns on a qualitative basis. Also, we have outlined the
collection of a new resource, a dataset of 240,000 tweets that have been labelled
for six emotion categories.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghemawat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Irving</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Tensor ow: a system for large-scale machine learning</article-title>
          .
          <source>In: OSDI</source>
          . vol.
          <volume>16</volume>
          , pp.
          <volume>265</volume>
          {
          <issue>283</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Abdul-Mageed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ungar</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Emonet: Fine-grained emotion detection with gated recurrent neural networks</article-title>
          .
          <source>In: Proceedings of the 55th Annual</source>
          <article-title>Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          .
          <source>vol. 1</source>
          , pp.
          <volume>718</volume>
          {
          <issue>728</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Amplayo</surname>
            ,
            <given-names>R.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sung</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hwang</surname>
          </string-name>
          , S.w.:
          <article-title>Cold-start aware user and product attention for sentiment classi cation</article-title>
          . arXiv preprint arXiv:
          <year>1806</year>
          .
          <volume>05507</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.:</given-names>
          </string-name>
          <article-title>Neural Machine Translation by Jointly Learning to Align and Translate</article-title>
          .
          <source>In: Proc. of the International Conference on Learning Representations (ICLR)</source>
          . San Diego, CA, USA (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.:</given-names>
          </string-name>
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Baziotis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelekis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doulkeridis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Datastories at semeval
          <article-title>-2017 task 4: Deep lstm with attention for message-level and topic-based sentiment analysis</article-title>
          .
          <source>In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval2017)</source>
          . pp.
          <volume>747</volume>
          {
          <fpage>754</fpage>
          . Association for Computational Linguistics, Vancouver, Canada (
          <year>August 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Baziotis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelekis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doulkeridis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Datastories at semeval
          <article-title>-2017 task 4: Deep lstm with attention for message-level and topic-based sentiment analysis</article-title>
          .
          <source>In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval2017)</source>
          . pp.
          <volume>747</volume>
          {
          <issue>754</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Brooks</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuksenok</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torkildson</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perry</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scott</surname>
          </string-name>
          , T.J.,
          <string-name>
            <surname>Anicello</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zukowski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>C.R.</given-names>
          </string-name>
          :
          <article-title>Statistical a ect detection in collaborative chat</article-title>
          .
          <source>In: Proceedings of the 2013 conference on Computer supported cooperative work</source>
          . pp.
          <volume>317</volume>
          {
          <fpage>328</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Cambria</surname>
          </string-name>
          , E.:
          <article-title>A ective computing and sentiment analysis</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          <volume>31</volume>
          (
          <issue>2</issue>
          ),
          <volume>102</volume>
          {
          <fpage>107</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang, Y.,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cui</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witbrock</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasegawa-Johnson</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>T.S.:</given-names>
          </string-name>
          <article-title>Dilated recurrent neural networks</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . pp.
          <volume>77</volume>
          {
          <issue>87</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Improving sentiment analysis via sentence type classi cation using bilstm-crf and cnn</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>72</volume>
          , 221{
          <fpage>230</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Dahou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elaziz</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Arabic sentiment classi cation using convolutional neural network and di erential evolution algorithm</article-title>
          .
          <source>Computational Intelligence and Neuroscience</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Derczynski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liakata</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Procter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoi</surname>
            ,
            <given-names>G.W.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zubiaga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semeval-2017 task 8: Rumoureval: Determining rumour veracity and support for rumours</article-title>
          .
          <source>arXiv preprint arXiv:1704.05972</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Dos</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Gatti</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Deep convolutional neural networks for sentiment analysis of short texts</article-title>
          .
          <source>In: Proceedings of COLING</source>
          <year>2014</year>
          ,
          <source>the 25th International Conference on Computational Linguistics: Technical Papers</source>
          . pp.
          <volume>69</volume>
          {
          <issue>78</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ekman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levenson</surname>
            ,
            <given-names>R.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friesen</surname>
            ,
            <given-names>W.V.</given-names>
          </string-name>
          :
          <article-title>Autonomic nervous system activity distinguishes among emotions</article-title>
          .
          <source>Science</source>
          <volume>221</volume>
          (
          <issue>4616</issue>
          ),
          <volume>1208</volume>
          {
          <fpage>1210</fpage>
          (
          <year>1983</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Felbo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mislove</surname>
          </string-name>
          , A., S gaard, A.,
          <string-name>
            <surname>Rahwan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm</article-title>
          .
          <source>arXiv preprint arXiv:1708.00524</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Go</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhayani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Twitter sentiment classi cation using distant supervision</article-title>
          .
          <source>CS224N Project Report, Stanford</source>
          <volume>1</volume>
          (
          <issue>12</issue>
          ) (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Generating Sequences With Recurrent Neural Networks</article-title>
          .
          <source>CoRR abs/1308</source>
          .0850 (
          <year>2013</year>
          ), http://arxiv.org/abs/1308.0850
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frasconi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>Gradient ow in recurrent nets: the di culty of learning long-term dependencies (</article-title>
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Lstm can solve hard long time lag problems</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>473</volume>
          {
          <issue>479</issue>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Klinger</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Clercq</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balahur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Iest: Wassa-2018 implicit emotions shared task</article-title>
          .
          <source>arXiv preprint arXiv:1809</source>
          .
          <volume>01083</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Kouloumpis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.D.</surname>
          </string-name>
          :
          <article-title>Twitter sentiment analysis: The good the bad and the omg!</article-title>
          <source>Icwsm</source>
          <volume>11</volume>
          ,
          <issue>164</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sangwan</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nayyar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abdel-Basset</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network</article-title>
          .
          <source>IEEE Access</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Luong</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>E ective approaches to attention-based neural machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1508.04025</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cambria</surname>
          </string-name>
          , E.:
          <article-title>Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive lstm</article-title>
          .
          <source>In: Thirty-Second AAAI Conference on Arti cial Intelligence</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Mei</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wondra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Topic sentiment mixture: modeling facets and opinions in weblogs</article-title>
          .
          <source>In: Proceedings of the 16th international conference on World Wide Web</source>
          . pp.
          <volume>171</volume>
          {
          <fpage>180</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bravo-Marquez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salameh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiritchenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Semeval-2018 task 1: A ect in tweets</article-title>
          .
          <source>In: Proceedings of The 12th International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>1</volume>
          {
          <issue>17</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          :
          <article-title># emotional tweets</article-title>
          .
          <source>In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>246</volume>
          {
          <fpage>255</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bravo-Marquez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Wassa-2017 shared task on emotion intensity</article-title>
          .
          <source>arXiv preprint arXiv:1708.03700</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritter</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Semeval-2016 task 4: Sentiment analysis in twitter</article-title>
          .
          <source>In: Proceedings of the 10th international workshop on semantic evaluation (semeval-2016)</source>
          . pp.
          <volume>1</volume>
          {
          <issue>18</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Pak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paroubek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Twitter as a corpus for sentiment analysis and opinion mining</article-title>
          .
          <source>In: LREc</source>
          . vol.
          <volume>10</volume>
          , pp.
          <volume>1320</volume>
          {
          <issue>1326</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , et al.:
          <article-title>Opinion mining and sentiment analysis</article-title>
          .
          <source>Foundations and Trends R in Information Retrieval</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          {2),
          <volume>1</volume>
          {
          <fpage>135</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Pascanu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Understanding the exploding gradient problem</article-title>
          .
          <source>CoRR, abs/1211</source>
          .5063 (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Plutchik</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice</article-title>
          .
          <source>American scientist 89(4)</source>
          ,
          <volume>344</volume>
          {
          <fpage>350</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Poria</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cambria</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gelbukh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis</article-title>
          .
          <source>In: Proceedings of the 2015 conference on empirical methods in natural language processing</source>
          . pp.
          <volume>2539</volume>
          {
          <issue>2544</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Ravi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravi</surname>
          </string-name>
          , V.:
          <article-title>A survey on opinion mining and sentiment analysis: tasks, approaches and applications</article-title>
          .
          <source>Knowledge-Based Systems 89</source>
          ,
          <fpage>14</fpage>
          {
          <fpage>46</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Riordan</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trichtinger</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          :
          <article-title>Overcon dence at the keyboard: Con dence and accuracy in interpreting a ect in e-mail exchanges</article-title>
          .
          <source>Human Communication Research</source>
          <volume>43</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>24</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farra</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Semeval-2017 task 4: Sentiment analysis in twitter</article-title>
          .
          <source>In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)</source>
          . pp.
          <volume>502</volume>
          {
          <issue>518</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>Rothkrantz</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Online emotional facial expression dictionary</article-title>
          .
          <source>In: Proceedings of the 15th International Conference on Computer Systems and Technologies</source>
          . pp.
          <volume>116</volume>
          {
          <fpage>123</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Rozental</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fleischer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Amobee at semeval
          <article-title>-2018 task 1: Gru neural network with a cnn attention mechanism for sentiment classi cation</article-title>
          . arXiv preprint arXiv:
          <year>1804</year>
          .
          <volume>04380</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <string-name>
            <surname>Sachan</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaheer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
          </string-name>
          , R.:
          <article-title>Revisiting lstm networks for semisupervised text classi cation via mixed objective function (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          43.
          <string-name>
            <surname>Schoene</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dethlefs</surname>
          </string-name>
          , N.:
          <article-title>Unsupervised suicide note classi cation (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          44.
          <string-name>
            <surname>Schoene</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dethlefs</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Automatic identi cation of suicide notes from linguistic and sentiment features</article-title>
          .
          <source>In: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage</source>
          ,
          <source>Social Sciences, and Humanities</source>
          . pp.
          <volume>128</volume>
          {
          <issue>133</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          45.
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiros</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
          </string-name>
          , R.:
          <article-title>Action recognition using visual attention</article-title>
          .
          <source>arXiv preprint arXiv:1511.04119</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          46.
          <string-name>
            <surname>Suttles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ide</surname>
          </string-name>
          , N.:
          <article-title>Distant supervision for emotion classi cation with discrete binary values</article-title>
          .
          <source>In: International Conference on Intelligent Text Processing and Computational Linguistics</source>
          . pp.
          <volume>121</volume>
          {
          <fpage>136</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          47.
          <string-name>
            <surname>Tay</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuan</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hui</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          :
          <article-title>Learning to attend via word-aspect associative fusion for aspect-based sentiment analysis</article-title>
          .
          <source>In: Thirty-Second AAAI Conference on Arti cial Intelligence</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          48.
          <string-name>
            <surname>Turk</surname>
            ,
            <given-names>A.M.:</given-names>
          </string-name>
          <article-title>Amazon mechanical turk</article-title>
          .
          <source>Retrieved August 17</source>
          ,
          <year>2012</year>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          49. Twitter:
          <article-title>Counting characters</article-title>
          . https://developer.twitter.com/en/docs/basics/countingcharacters.html (
          <year>Dec 2018</year>
          ),
          <source>accessed on 2018-11-11</source>
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          50. Twitter:
          <article-title>Developer policy</article-title>
          . https://developer.twitter.com/en.html (
          <year>Dec 2018</year>
          ),
          <source>accessed on 2018-11-11</source>
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          51.
          <string-name>
            <surname>Van Den Oord</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dieleman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalchbrenner</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senior</surname>
            ,
            <given-names>A.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Wavenet: A generative model for raw audio</article-title>
          .
          <source>In: SSW</source>
          . p.
          <volume>125</volume>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          52.
          <string-name>
            <surname>Vezhnevets</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osindero</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaul</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heess</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaderberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silver</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Feudal networks for hierarchical reinforcement learning</article-title>
          .
          <source>arXiv preprint arXiv:1703.01161</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          53.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>K.R.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>X.</surname>
          </string-name>
          :
          <article-title>Dimensional sentiment analysis using a regional cnn-lstm model</article-title>
          .
          <source>In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <given-names>Short</given-names>
            <surname>Papers</surname>
          </string-name>
          <article-title>)</article-title>
          .
          <source>vol. 2</source>
          , pp.
          <volume>225</volume>
          {
          <issue>230</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          54.
          <string-name>
            <surname>Widen</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brooks</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Anger and disgust: Discrete or overlapping categories</article-title>
          .
          <source>In: 2004 APS Annual Convention</source>
          , Boston College, Chicago, IL (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          55.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellmore</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Fast learning for sentiment analysis on bullying</article-title>
          .
          <source>In: Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining</source>
          . p.
          <fpage>10</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          56.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiros</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhudinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zemel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Show, attend and tell: Neural image caption generation with visual attention</article-title>
          .
          <source>In: International conference on machine learning</source>
          . pp.
          <year>2048</year>
          {
          <year>2057</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          57.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          , E.:
          <article-title>Hierarchical attention networks for document classi cation</article-title>
          .
          <source>In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <volume>1480</volume>
          {
          <issue>1489</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          58.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Schutze, H.,
          <string-name>
            <surname>Xiang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Abcnn: Attention-based convolutional neural network for modeling sentence pairs</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>4</volume>
          ,
          <issue>259</issue>
          {
          <fpage>272</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          59.
          <string-name>
            <surname>Young</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hazarika</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poria</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cambria</surname>
          </string-name>
          , E.:
          <article-title>Recent trends in deep learning based natural language processing</article-title>
          .
          <source>ieee Computational intelligenCe magazine 13</source>
          (
          <issue>3</issue>
          ),
          <volume>55</volume>
          {
          <fpage>75</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>