<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>SCI2S at TASS 2018: Emotion Classi cation with Recurrent Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S en TASS</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>117</fpage>
      <lpage>123</lpage>
      <abstract>
        <p>In this paper, we describe the participation of the team SCI2S in all the Subtasks of the Task 4 of TASS 2018. We claim that the use of external emotional knowledge is not required for the development of an emotional classi cation system. Accordingly, we propose three Deep Learning models that are based on a sequence encoding layer built on a Long Short-Term Memory gated-architecture of Recurrent Neural Network. The results reached by the systems are over the average in the two Subtasks, which shows that our claim holds.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>People usually have a look at
advertisements when they read traditional
newspapers. These advertisements generally t the
news that are in the same, previous or next
page, because the match of the news and
the ads are carefully decided during the
edition time, which is before the printing of the
newspaper. Nowadays, online newspapers
are as read as traditional ones, hence
companies also want to show their brands in online
newspapers, and they invest money to buy
ads in them. However, one of the di erences
between traditional and on-line newspapers
is the moment when the correspondence
between the news and the advertisements is
done, which is in reading time. Thus, the
news and the ads likely do not match.</p>
      <p>The lack of correspondence between a
news and a advertisement means that the
topic of the news is not suitable for the
advertisement, or the emotion that may elicit
from the reader is not positive. If the
readers are disgusted by the news, they may be
revolted by the advertisement too, which is
highly detrimental for the brand advertised.
The advertising spots in online newspapers
are xed beforehand, and the advertisement
that appears in each spot does not depend
on the decision of the editor or the
journalist, but it depends on a automatic
broadcasting system of ads of an online marketing
company. Consequently, companies are not
able to control whether the reputation of its
brands may be damaged, which is known by
marketing experts as the brand safety issue.1</p>
      <p>
        The Task 4 of TASS 2018
        <xref ref-type="bibr" rid="ref10">(Mart
nezCamara et al., 2018)</xref>
        is focused on the
mentioned issue of brand safety, and it proposes
the classi cation if a news is sure for a brand
according to the emotion elicited from the
readers when they read the headline of a
news. The organization provided an
anno
      </p>
    </sec>
    <sec id="sec-2">
      <title>1https://www.thedrum.com/opinion/2018/</title>
      <p>07/09/brand-safety-the-importance-qualitymedia-fake-news-and-staying-vigilant</p>
      <p>Copyright © 2018 by the paper's authors. Copying permitted for private and academic purposes.
tated corpus of headlines of news of Spanish
written newspapers from around the world,
so the corpus SANSE is a global
representation of the written Spanish language. In this
paper, we present the systems submitted by
the SCI2S team to the two Subtasks of Task
4 of TASS 2018.2</p>
      <p>We claim that the emotional classi cation
can be tackled without the use of emotional
features or any other kind of handcrafted
linguistic feature. We thus propose the
generation of dense high quality features following a
sentence encoding approach, and then the use
of a non lineal classi er. We submitted three
systems based on the encoding of the input
headline with a Recurrent Neural Network
(RNN) Long Short Term Memory (LSTM).</p>
      <p>Our submitted systems are over the average
in the competition, hence this fact shows that
our claim holds.
2</p>
      <sec id="sec-2-1">
        <title>Architecture of the models</title>
        <p>The organization proposed two Subtasks, the
rst one is de ned in a monolingual context,
and the second in a multilingual one. The
rst Subtask has two levels of evaluation,
which di er in the size of the evaluation set.
We designed the neural architecture without
taking into account the speci c
characteristics of the Subtasks, because our aim was the
evaluation of our claim on the SANSE
corpus.</p>
        <p>The architecture of the three systems
submitted is composed of three modules: (1)
language representation, for the sake of
simplicity embeddings lookup module; (2) sequence
encoding module, in which the three
architectures di er; and (3) non lineal classi cation.
The details of each module are explained in
the following subsections.
2.1</p>
        <sec id="sec-2-1-1">
          <title>Embeddings lookup layer</title>
          <p>
            Regarding our claim, we de ned a feature
vector space for the training and the
evaluation that is composed of unsupervised
vectors of word embeddings. A set of vectors of
word embeddings is the representation of the
ideal semantic space of words in a real-valued
continuous vector space, hence the
relationships between vectors of words mirror the
linguistic relationships of the words. Vectors of
word embeddings are a dense representation
of the meaning of a word, thus each word is
2The details about the Task 4 of TASS 2018 are
in
            <xref ref-type="bibr" rid="ref10">(Mart nez-Camara et al., 2018)</xref>
            .
linked to a real-valued continuous vector of
dimension demb.
          </p>
          <p>
            There are di erent algorithms to build
vectors of word embeddings in the literature,
standing out C&amp;W
            <xref ref-type="bibr" rid="ref3">(Collobert et al., 2011)</xref>
            ,
word2vect
            <xref ref-type="bibr" rid="ref12">(Mikolov et al., 2013)</xref>
            and Glove
            <xref ref-type="bibr" rid="ref13">(Pennington, Socher, and Manning, 2014)</xref>
            .
Likewise, several sets of pre-trained vectors of
word embeddings built using the previous
algorithms are freely available. However, those
pre-trained sets were generated using
documents written in English, thus they cannot
been used for representing Spanish words.
          </p>
          <p>
            We used the pre-trained set of word
embeddings SBW3
            <xref ref-type="bibr" rid="ref2">(Cardellino, 2016)</xref>
            . SBW
was built upon several Spanish corpora, and
the most relevant characteristics of its
development were: (1) the capitalization of
the words were kept unchanged; (2) the
word2vect algorithm used was skip-gram;
(3) the minimum allowed word frequency was
5; and (4) the dimension or components of
the word vectors is 300 (demb = 300).
          </p>
          <p>We tokenized the input headlines with
the default tokenizer of NLTK4 in order to
project them in the feature vector space
dened by the vector of word embeddings.
Consequently, each headline (h) is transformed in
a sequence of n words (w1:n = fw1; : : : ; wng).
The size of the input sequence (n) was
dened by the maximum length of the inputs
in the training data, hence sequences shorter
than n were truncated. After the
tokenization, the rst layer of our architecture model
is an embedding lookup layer, which makes
the projection of the sequence of tokens into
the feature vector space. Therefore, the
output of the embeddings lookup layer is the
matrix WE 2 IRd;n, WE1T:n = (we1; : : : ; wen),
where wei 2 IRd. The parameters of the
embedding lookup layer are not updated during
the training.
2.2</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Sequence encoding layer</title>
          <p>The aim of the sequence encoding layer is the
generation of high level features, which
condense the semantic of the entire sentence. We
used an RNN layer because RNNs can
represent sequential input in a xed-size vector
and paying attention to the structured
properties of the input (Goldberg, 2017). RNN is
de ned as a recursive R function applied to</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3https://crscardellino.github.io/SBWCE/</title>
    </sec>
    <sec id="sec-4">
      <title>4https://www.nltk.org/api/nltk.tokenize.</title>
      <p>html
a input sequence. The input of the function
R is an state vector si 1 and an element of
the input sequence, in our case a word vector
(wei). The output of R is a new state vector
(si), which is transformed to the output
vector yi by a deterministic function O.
Equation 15 summarizes the former de nition.</p>
      <p>RNN(we1:n; s0) = y1:n
yi = O(si)
si = R(wei; si 1);
(1)
wei 2 IRdin; si 2 IRf(dout); yi 2 IRdout</p>
      <p>
        From a linguistic point of view, each
vector (yi) of the output sequence of an
RNN condenses the semantic information
of the word wi and the previous words
(fw1; : : : ; wi 1g). However, according to the
distributional hypothesis of language
        <xref ref-type="bibr" rid="ref6">(Harris, 1954)</xref>
        , semantically similar words tend to
have similar contextual distributions, or in
other words, the meaning of a word is de ned
by its contexts. An RNN can only encode the
previous context of a word when the input of
the RNN is the sequence we1:n. However, the
input of the RNN can be also the reverse of
the previous sequence (wen:1). Consequently,
we can elaborate a composition of two RNNs,
the rst one encodes the sequence from the
beginning to the end (forward, f ), and a
second one from the end to the beginning
(backward, b), therefore the previous and the
following context of a word is encoded. This
elaboration is known as bidirectional RNN
(biRNN), whose de nition is in Equation 2.
biRNN(we1:n) = [RNNf (we1:n; sf0 );
      </p>
      <p>RNNb(wen:1; sb0)] (2)</p>
      <p>
        The three systems submitted are based
on the use of a speci c gated-architecture
of RNN, namely LSTM
        <xref ref-type="bibr" rid="ref7">(Hochreiter and
Schmidhuber, 1997)</xref>
        , which has reached
strong results in several Natural Language
Processing tasks
        <xref ref-type="bibr" rid="ref1 ref11 ref14 ref8 ref9">(Tang, Qin, and Liu, 2015;
Kiperwasser and Goldberg, 2016; Mart
nezCamara et al., 2017)</xref>
        . The speci c details of
the sequence encoding layer of each
submitted system are described as what follows.
      </p>
      <p>5The de nition of RNN states that the dimension
of si is a function of the output dimension, but some
architectures as LSTM does not allow that exibility.</p>
      <p>Single LSTM (SLSTM). The layer is
composed of one LSTM, whose input is the
sequence we1:n, and its output is composed
of a single vector, namely the last output
vector (yn 2 IRdout). In this case, the
semantic information of the entire headline is
condensed in the last vector of the LSTM,
which correspond to the last word.</p>
      <p>Single biLSTM (SbLSTM). In order to
encoded the previous and forward context
of the words of the input sequence, the
sequential encoding layer of this system is a
biLSTM. The output is the concatenation
of the last output vector of the two LSTMs
of the biLSTM (yn = [ynf; ynb] 2 IR2 dout).
Sequence LSTM (SeLSTM). The
encoding is carried out by an LSTM, but the
output is composed of all output vectors of all
the words of the sequence, hence the
output is not a vector, but the sequence y1:n,
yi 2 IRdout.</p>
      <p>The semantic information returned by
SeLSTM is greater than the other two layers,
because it returns the output vector of each
word, therefore the subsequent layers receive
more semantic information from the sequence
encoding layer.
2.3</p>
      <sec id="sec-4-1">
        <title>Non lineal classi cation layer</title>
        <p>Since RNN and speci cally LSTM has the
ability of encoding the semantic information
of the input sequence, the output of the
sequence encoding layer is a high level
representation of the semantic information of the
input headline.</p>
        <p>The sequence representation of the
headline is then classi ed by three fully connected
layers with ReLU as activation function, and
additional layer activated by the softmax
function. The layers activated by ReLU have
di erent hidden units or output neurons (see
Table 1). The SeLSTM layer does not return
an output vector, but an output sequence
y1:n 2 IRn;dout. Thus, after the second fully
connected layer, the sequence is attened to
a single vector y 2 IRn dout. Since the task
is a binary classi cation task, the number of
hidden units of the softmax layer is 2.</p>
        <p>In order to avoid over tting, we add a
dropout layer after each fully connected layer
with a dropout rate value (dr ). Besides, we
applied an L2 regularization function to the
output of each fully connected layer with a
regularization value (r ). Moreover, the
training is stopped in case the loss value does not
improve in 5 epochs.</p>
        <p>
          The training of the network was
performed by the minimization of the cross
entropy function, and the learning process was
optimized with the Adam algorithm
          <xref ref-type="bibr" rid="ref1 ref14 ref8">(Kingma
and Ba, 2015)</xref>
          with its default learning rate.
The training was performed following the
minibatches approach with a batch size of 25,
and the number of epochs was 40.
        </p>
        <p>For the sake of the replicability of the
experiments, Table 1 shows the values of the
hyperparaments of the network, and the source
code of our experiments is publicly available.6
Hyper. value SLSTM
biLSTM</p>
        <p>SeLSTM
n
demb
dout
dr1
dr2
dr3
L2 r1
L2 r2
L2 r3
The organization provided a development set
of the SANSE corpus with the aim that the
teams would use the same data to tune the
classi cation models. We participated in the
two levels of Subtasks 1 and in the Subtask
2, and we present in Tables 2, 3 and 4 the
results reached with the development set
(development time) and the o cial results with
the test set of SANSE (evaluation time).</p>
        <p>The main di erences among the
submitted systems are: (1) The semantic
information encoded; and (2) the number of
parameters. SLSTM is the model with less
semantic information encoded, because the
LSTM is only run in one direction, and the
last output vector of the LSTM is only
processed by the subsequent layers. Although
SbLSTM encodes more semantic information
than SLSTM, they have the same number of
parameters, because SbLSTM only processes
the last output vector of the sequence
encoding layer as the SLSTM model. In contrast,</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6https://github.com/rbnuria/TASS-2018</title>
      <p>the SeLSTM is the model that uses more
parameters, because it processes the output
vectors of the sequence encoding layer of each
input word.</p>
      <p>We expected that models with a higher
number of parameters and capacity of
encoding semantic information, they will reach
higher results in the competition, or in other
words, they will have a higher capacity of
generalization. However, the comparison of
the results reached on the development and
test set shows a non expected performance.
Regarding the two main di erences among
the models, we highlight the following two
facts:
Generalization capacity. The model that
reached a higher results in the two levels of
the Subtask 1 is SLSTM. The performance
of SLSTM stands out in the second level of
Subtask 1, because it is the second higher
ranked system. Since the test set of the
second level is larger than the level one,
it demands a higher generalization
capacity from the systems, thus the good
performance of SLSTM is more relevant. In
contrast, SbLSTM and SeLSTM are in the fth
and sixth position respectively in the second
level, and the sixth and seventh position in
the rst level of Subtask 1, which was not
expected due to they have more parameters
and condense more semantic information.
Concerning the Subtask 2, the results
reached were the expected ones,
because SeLSTM, which has more parameters
and condense more semantic information,
reached the best results among our three
systems. The generalization demand in this
task is high too, because the language or the
domain of the training and the test sets and
di erent, because the training set is
composed of headlines written in the Spanish
language used in America, and the test set
is written in the Spanish language used in
Spain.</p>
      <p>
        Although the generalization capacity of our
systems is high, the di erent performance in
Subtask 1 and Subtask 2 allow us to
conclude that to reach a good generalization
capacity, a balance between the number of
parameters and the complexity or depth of
the neural network is required as it is also
asserted in
        <xref ref-type="bibr" rid="ref4">(Conneau et al., 2017)</xref>
        .
      </p>
      <p>Di erences among datasets. SLSTM and
SbLSTM reached a value of Macro Recall
Development
Test (o cial)
M. Prec.</p>
      <p>M. Recall</p>
      <p>M. F1</p>
      <p>M. Prec.</p>
      <p>M. Recall</p>
      <p>M. F1
higher than the value of Macro-Precision in
the development set of Subtask 1 in the two
levels of evaluation. However, they reached
the inverse relation on the test set of both
levels of Subtask 1. In contrast, SeLSTM
had the same trend in both datasets, thus
the performance of SeLSTM shows a higher
stability. On the other hand, the three
systems had the same performance in the
development and test sets in Subtask 2, that
it is to say, the value of Macro-Precision was
higher than the value of Macro-Recall in
development and evaluation time.</p>
      <p>Regarding the di erences between the
datasets, the performance of models with
more parameters and with more semantic
information is more stable, which means
that the results in development time follows
a similar trend to the results in evaluation
time that is an desirable characteristic of a
classi cation system.</p>
      <p>Regarding the competition, the rank
position of our systems are in Table 5. In Subtask
1, the systems reached a rank position over
the average, and SLSTM stands out in Level
2 of Subtask 1. In Subtask 2, the systems are
on the average, and the performance is close
to their competitors. Regarding our claim
and the high results reached by the three
systems, we conclude that our claim holds, hence
we can obtain strong results in the task of
emotion classi cation without the use of
emotional features.
4</p>
      <sec id="sec-5-1">
        <title>Conclusions</title>
        <p>We described the three systems submitted
to all the Subtasks of Task 4 of TASS 2018
by the team SCI2S. Our proposal is based
on the claim that emotional classi cation
can be performed without the use of
emotional external knowledge or handcrafted
features. The three systems are three neural
networks grounded in a sentence classi
cation approach, namely the use of an LSTM
and a biLSTM. The three systems reached
a rank position over the average in the two
Subtasks of Task 4, thus we conclude that
our claim holds.</p>
        <p>
          Our future work will go in the direction
de ned by the analysis of the results (see
Section 3), hence we are going to work in the
study of the balance between the depth and
the generalization capacity of our emotional
classi cation model. Likewise, we will work
in the addition of an Attention layer
          <xref ref-type="bibr" rid="ref1 ref14 ref8">(Bahdanau, Cho, and Bengio, 2015)</xref>
          to the model,
with the aim of automatically selecting the
most relevant features.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Acknowledgements</title>
        <p>This work was partially supported by the
Spanish Ministry of Science and Technology
under the project TIN2017-89517-P, and a
grant from the Fondo Europeo de
Desarrollo Regional (FEDER). Eugenio Mart nez
Camara was supported by the Juan de la
Cierva Formacion Programme
(FJCI-201628353) from the Spanish Government.</p>
        <p>Goldberg, Y. 2017. Neural Network Methods
for Natural Language Processing. Morgan
&amp; Claypool Publishers.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>In 3rd International Conference for Learning Representations</source>
          , San Diego,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Cardellino</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Spanish Billion Words Corpus</article-title>
          and Embeddings, March.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Collobert</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karlen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kuksa</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2493</fpage>
          {
          <fpage>2537</fpage>
          ,
          <string-name>
            <surname>November</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Barrault</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lecun</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Very deep convolutional networks for text classi cation</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>1</volume>
          ,
          <string-name>
            <surname>Long</surname>
            <given-names>Papers</given-names>
          </string-name>
          , pages
          <volume>1107</volume>
          {
          <fpage>1116</fpage>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>Z. S.</given-names>
          </string-name>
          <year>1954</year>
          .
          <article-title>Distributional structure</article-title>
          .
          <source>WORD</source>
          ,
          <volume>10</volume>
          (
          <issue>2-3</issue>
          ):
          <volume>146</volume>
          {
          <fpage>162</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          ,
          <string-name>
            <surname>November</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>In 3rd International Conference for Learning Representations</source>
          , San Diego,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Kiperwasser</surname>
            , E. and
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Goldberg</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Simple and accurate dependency parsing using bidirectional lstm feature representations</article-title>
          .
          <source>Transactions of the Association of Computational Linguistics</source>
          ,
          <volume>4</volume>
          :
          <fpage>313</fpage>
          {
          <fpage>327</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Mart</surname>
            nez-Camara, E.,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Almeida-Cruz</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          <article-title>D az-</article-title>
          <string-name>
            <surname>Galiano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Estevez-Velarde</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Garc</surname>
            a-Cumbreras, M. Garc aVega,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>A. Montejo</given-names>
          </string-name>
          <string-name>
            <surname>Raez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montoyo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Mun</surname>
          </string-name>
          <article-title>~oz, A. PiadMor s, and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Villena-Roman</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of TASS 2018: Opinions, health and emotions</article-title>
          . In E. Mart nezCamara,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Almeida-Cruz</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. C.</surname>
          </string-name>
          <article-title>D az-</article-title>
          <string-name>
            <surname>Galiano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Estevez-Velarde</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Garc</surname>
            a-Cumbreras,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Garc</surname>
            a-Vega,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>A. Montejo</given-names>
          </string-name>
          <string-name>
            <surname>Raez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montoyo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Mun</surname>
          </string-name>
          <article-title>~oz, A. Piad-Mor s, and</article-title>
          J. Villena-Roman, editors,
          <source>Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS</source>
          <year>2018</year>
          ), volume
          <volume>2172</volume>
          <source>of CEUR Workshop Proceedings</source>
          , Sevilla, Spain, September. CEUR-WS.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Mart</surname>
            nez-Camara,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shwartz</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Dagan</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Neural disambiguation of causal lexical markers based on context</article-title>
          .
          <source>In IWCS 2017 { 12th International Conference on Computational Semantics { Short papers.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          . In C. J.
          <string-name>
            <surname>C. Burges</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Ghahramani</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. Q. Weinberger, editors,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>26</volume>
          . Curran Associates, Inc., pages
          <volume>3111</volume>
          {
          <fpage>3119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>1532</fpage>
          {
          <fpage>1543</fpage>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Qin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Document modeling with gated recurrent neural network for sentiment classi cation</article-title>
          .
          <source>In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <volume>1422</volume>
          {
          <fpage>1432</fpage>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>