<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>From Recurrency to Attention in Opinion Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rosa Mar a Montan~es-Salas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael del-Hoyo-Alonso</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roc o Aznar-Gimeno</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>8</institution>
          ,
          <addr-line>Zaragoza</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technological Institute of Aragon (ITAINNOVA), Mar a de Luna</institution>
          ,
          <addr-line>7</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>589</fpage>
      <lpage>597</lpage>
      <abstract>
        <p>This paper describes the participation of ITAINNOVA at the Sentiment Analysis in Twitter task (TASS) framed within the new evaluation forum IberLEF (Iberian Languages Evaluation Forum). This work explores two di erent Deep Learning approaches, validating their performance on both subtasks (Monolingual and Crosslingual Sentiment Analysis). The rst one is an embedding-based strategy combined with bidirectional recurrent neural networks, which receives the name Char Bi-LSTM network, and the second one, a recent language representation model, called BERT (Bidirectional Encoder Representations from Transformers). Although the performance of the second approach is not recognized in the o cial results of the task, we also present this approach, which performance has been reasonably remarkable and greater than the rst approach.</p>
      </abstract>
      <kwd-group>
        <kwd>Sentiment analysis</kwd>
        <kwd>Twitter</kwd>
        <kwd>Deep learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The Workshop on Sentiment Analysis, framed within the new evaluation
forum IberLEF (Iberian Languages Evaluation Forum) and celebrated under the
umbrella of the International Conference of the Spanish Society for Natural
Language Processing (SEPLN), known as TASS, has become one of the most
important events in the eld of semantic analysis over Spanish written texts [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The
workshop is an ideal meeting point for the exchange of ideas between
professionals and researchers in the eld of Natural Language Processing (NLP) in general
and sentiment analysis in particular. The aim of the proposed task is to
promote the current state of development of polarity classi cation systems at tweet
level in Spanish. As of TASS 2018 edition [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] the challenges of multilinguality
and generalization capacity of the systems arised in the form of new subtasks.
Subtask 1 (Monolingual Sentiment Analysis) is focused on single-language
analysis using the same variant for training, validation and testing, while subtask 2
(Cross-lingual Sentiment Analysis) aims to evaluate the dependency of systems
on a language.
      </p>
      <p>
        In this context we have explored two di erent approaches based on advanced
deep learning techniques. The rst, and the o cial one, applies a traditional
feature-based strategy, which commonly uses pre-trained data representations as
feature inputs of the task-oriented model architecture. The second approach is
based on the transfer learning method, where a model is trained with one speci c
learning objective and then is reused as the starting point to learn how to solve
a di erent problem. One of the most recent, e ective and adaptable works based
on deep bidirectional transformers is the BERT (Bidirectional Encoder
Representations from Transformers) implementation, which has shown considerable
improvements over a wide range of NLP tasks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We have participated on both
subtasks in order to validate our proposed solutions.
      </p>
      <p>The paper is organised as follows: after this introduction, we will brie y
describe the set of works which has inspired our approaches. In section 3 detailed
architectures of the solutions are presented, followed by the details of the
experiments carried out in section 4. Results of those experiments will be presented as
well. Finally, in section 5 we will summarize the main conclusions drawn during
the experimentation and future working directions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Language modeling is one of the most di cult problems yet to resolve in the NLP
eld. In recent years this problem has been tackled by the generation of dense
semantic representations obtained through training unsupervised algorithms on
large text corpora. In the case of the Spanish language, the most commonly
used resources are Wikipedia and the Spanish Billion Word Corpus compiled by
Cardellino [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. There have been developed multiple approaches in order to obtain
pretrained word embeddings such as: word2vec [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], GloVe [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and FastText
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. From a character level perspective, some authors have reported performance
improvements by using char embeddings on language modeling [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and
negrained sentiment analysis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Moreover, the e ectiveness of recurrent neural networks has been widely
demonstrated over several NLP tasks, in particular the use of Long Short-Term
Memory networks (LSTMs) and its bidirectional version (Bi-LSTMs) (for
example in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]). Unlike traditional recurrent networks, these networks have
the characteristic of learning long-term dependencies, allowing a greater window
of context information, thus improving performance on language related tasks,
where the context signi cantly in uences semantic analysis.
      </p>
      <p>
        Combination of these techniques has lead to the development of complex but
increasingly powerful architectures such as the Char Bi-LSTM networks which
are used mainly for sequence tagging tasks as Named Entity Recognition and
Classi cation (NERC) problem, as shown in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        While the previous models use word embeddings in order to introduce the
word concept and RNN (LSTM) models that do model word order directly and
explicitly track states across the sentence. Our second approach uses BERT,
a transformers based architecture, which in contrast to LSTM, where order is
important, BERT does not have an explicit notion of word order beyond marking
each word with its absolute-position embedding, the language modelling relies
mainly on attention mechanisms [
        <xref ref-type="bibr" rid="ref19 ref5 ref8">19,5,8</xref>
        ]. BERT is a recent natural language
processing model that has shown groundbreaking results in many tasks such as
question answering, natural language inference and paraphrase detection [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], but
has been scarcely tested on Spanish language until recently ([
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]).
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed approaches</title>
      <p>
        Inspired by the general conclusions derived from the workshop on Semantic
Analysis in 2018 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and by our previous contributions on the Sentiment Analysis
tasks [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], we have followed the line of deep learning based solutions. Firstly, we
have explored an embedding-based strategy combined with bidirectional
recurrent neural networks, which receives the name Char Bi-LSTM network. Secondly,
motivated by the reported improvements of BERT in English language modeling,
we decided to study its e ciency on this challenging Spanish Tweet classi cation
task.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Char Bi-LSTM network</title>
        <p>As stated before, joining the strengths of embedding-based language models
with neural architectures focused on temporal sequence learning, has shown
promising results on some NLP tasks. Consequently, our rst approach relies
on the architecture shown in the following gure (Fig. 1), in order to solve the
polarity classi cation problem over Spanish written texts.</p>
        <p>Proposed architecture learns a representation of input documents as a
concatenation of self-learned char-embeddings with sequence word embeddings (loaded
from Spanish pretrained word embedding models). This representation feeds the
bidirectional LSTM module, which could be composed of various layers. The
output class is obtained through a softmax cross entropy layer which returns the
probability for each label.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>BERT classi er</title>
        <p>
          Currently, research community is studying a new typology of language
architectures that goes beyond the traditional vector representations, such as ELMO
(Embedded from Language Model), GPT (Generative Pre-trained Transformer),
GPT-2 and BERT. The goal of these architectures is to develop models,
increasingly complex, for language understanding. Both Open AI GPT and BERT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
use the transformer architecture to learn text representations. The main di
erence between them is that BERT uses a bidirectional transformer (from left to
right and from right to left) instead of a directional transformer (from left to
right). Regarding ELMo, it uses a shallow concatenation layer, while BERT uses
a deep neural network.
        </p>
        <p>In its basic form, BERT includes three separate mechanisms: an encoder
that reads text input, a group of pooler layer and a decoder that produces a
prediction or a classi cation layer for the task. When learning linguistic models,
it is di cult to de ne a prediction objective. Many models predict the next word
in a sequence (for example, \The man traveled to its work by "), a directional
approach that inherently limits contextual learning. To overcome this challenge,
BERT uses two training strategies. In the rst method, named \masked LM" due
to the masking procedure applied to train the Language Model, before entering
sequences of words in BERT, 15 % of the words in each sequence are replaced by
a token [MASK]. Next, the model attempts to predict the original value of the
masked words, based on the context provided by the other unmasked words in the
sequence. This method tries to obtain relationships between the existing words in
a sentence. The second method is the prediction of the following sentence, so try
to o er continuity in the discourse. In this training process, the model receives
pairs of sentences as input and learns to predict whether the second sentence of
the pair is the next sentence of the original document. During training, 50% of
the entries are a pair in which the second sentence is the next sentence of the
original document, while in the other 50% a random sentence of the corpus is
chosen as the second sentence.</p>
        <p>Released pretrained language models, build by these methodologies, include
a variety of options: English and multilingual models (including Spanish), cased
and uncased models, and the possibility to choose between a base or large version.
A detailed list of released models can be found at Google research Github1. On
Fig. 2: BERT</p>
        <p>
          ne-tuned architecture for single sentence classi cation ([
          <xref ref-type="bibr" rid="ref5">5</xref>
          ])
top of pretrained language models, it also provides the functionality for ne
tuning them to update learnt weights, by re-training the language model using
our own text corpus.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>With the goal of providing an experimental setup for reproducing the results
exposed in this section, we include a description of the datasets, model con
guration parameters and execution environment used in the course of our trials.</p>
      <p>
        Datasets for training, evaluation and test have been provided by TASS
organization [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] . InterTASS dataset collects ve Spanish variants in this edition:
ES (Spain), PE (Peru), CR (Costa Rica), UR (Uruguay) and MX (Mexico). All
of them are annotated with 4 di erent levels of opinion intensity: P (positive),
NEU (neutral), NONE (no opinion), N (negative).
      </p>
      <p>Model con guration parameters have been established through an exhaustive
searching process mainly focused on the Spanish model. It has been decided to
use this parameterization for the training and evaluation of the rest of the
languages considered. The following subsections indicate the parameters and values
used for each of the approximations studied in order to allow the reproducibility
of the results obtained. All the experiments have been con gured to use Nvidia
GPUs (Tesla V100 and TITAN Xp).
1 https://github.com/google-research/bert
4.1</p>
      <sec id="sec-4-1">
        <title>Char Bi-LSTM network</title>
        <p>Word2Vec and FastText pretrained models have been trained over our own
corpus built from the SBWC corpus merged with a set of 10000 tweets
approximately, retrieved from the Twitter public API over the past year.</p>
        <p>During the rst trials, we observed a fast over tting of the network, which
caused an slightly accuracy improvement due to fact that the model was
discarding NEU and NONE classes and getting right extreme opinions (positive
and negative).
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>BERT classi er</title>
        <p>Experimental settings of BERT model are listed below (non mentioned
parameters, available on the original implementation, has been left at their default
values):
{ bert model : bert-base-multilingual-uncased
{ train batch size : 32
{ gradient accumulation steps : 1
{ num train epochs : 5
{ learning rate : 1e-5
{ warmup proportion : 0.1
{ max seq length : 70
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Results</title>
        <p>
          This section collects the set of o cial and uno cial results retrieved from the
experiments previously described. It also includes results from our contribution
on TASS previous edition [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] (stated as Model2018 ) for comparison purposes.
        </p>
        <p>Just Char Bi-LSTM models results are considered o cial. Non o cial results,
as BERT monolingual and crosslingual metrics, have been calculated making use
of the evaluation scripts provided by the organization at the beginning of the
competition, so that metrics are obtained under the same conditions as the
o cial ones.</p>
        <p>Subtask 1 (Monolingual Sentiment Analysis) is focused on single-language
analysis using the same variant for training, validation and testing, while subtask
2 (Cross-lingual Sentiment Analysis) aims to evaluate the dependency of systems
on a language.</p>
        <p>Monolingual Training, validation and test using each InterTASS dataset
independently.</p>
        <p>Model
Char1
Char2
Bert
Model2018
Crosslingual Training a selection of any dataset and use a di erent one to test.
In our case we have trained independently on ES and MX datasets, choosing
nally the MX model to be tested on the rest of languages, given its superior
results.</p>
        <p>In the case of Model2018, Spanish (ES) variant was selected to be tested
against available variants in that edition, obtaining a F1 score of 0.4090 for CR
and 0.3670 for PE.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and future work</title>
      <p>
        Within the previous edition of TASS [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] we obtained some rather discouraging
o cial results, so that we decided to explore a more complete deep learning
approach based on recurrent neural networks and embedding representations.
Unfortunately, results obtained in this editions are very close to the previous ones,
probably due to the excessive computational complexity of the joint embedding
model for a sentence level classi cation task on short texts. This circumstance
led us to consider a less traditional approach such as BERT, based on attention
mechanisms which has shown a good generalization capability in a great variety
of English-NLP tasks. We have been able to con rm that its Spanish language
model works surprisingly well on the sentiment analysis task, and furthermore
it adapts seamlessly against di erent variants of the same language. Therefore,
we can conclude that models based on deep learning continue to be one of the
most successful approaches from a computational point of view.
      </p>
      <p>Nevertheless, studied approaches have certain limitations, such as the ability
to distinguish between NEU and NONE labels. It has been observed
systematically the di culty of the algorithms to learn this classi cation due to their
semantic proximity. Furthermore, the multilingual challenge on Twitter
publications analysis remains open and gives much room for improvement.</p>
      <p>As future work lines we expect to explore further the re-training of the
Spanish language model with a larger corpus and searching for optimal parameters
pursuing a signi cant improvement in this model performance, as well as
research and get deep insights in the use of attention models on natural language
analysis.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work has been partially funded by the Department of Big Data and
Cognitive Systems at the Technological Institute of Aragon. We also thank the
support of the FSE Operative Programme for Aragon 2014{2020 (IODIDE research
group).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Benballa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Collet</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Picot-Clemente</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          : Saagie at semeval
          <article-title>-2019 task 5: From universal text embeddings and classical features to domain-speci c text classi cation</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>469</volume>
          {
          <issue>475</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>5</volume>
          ,
          <issue>135</issue>
          {
          <fpage>146</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cardellino</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <source>Spanish Billion Words Corpus and Embeddings (March</source>
          <year>2016</year>
          ), https://crscardellino.github.io/SBWCE/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chiu</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nichols</surname>
          </string-name>
          , E.:
          <article-title>Named entity recognition with bidirectional lstm-cnns</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>4</volume>
          ,
          <issue>357</issue>
          {
          <fpage>370</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          . CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1810</year>
          .04805
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D</given-names>
            <surname>az-Galiano</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.C.</surname>
          </string-name>
          , et al.:
          <article-title>Overview of tass 2019</article-title>
          .
          <article-title>CEUR-WS, Bilbao</article-title>
          , Spain (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Garc a Cumbreras,
          <string-name>
            <surname>M.A.</surname>
          </string-name>
          , Mart nez Camara,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Villena</surname>
          </string-name>
          <string-name>
            <surname>Roman</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ,
          <source>Garc</source>
          a Morera, J.:
          <article-title>Tass 2015 the evolution of the spanish opinion mining systems (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jawahar</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagot</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seddah</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>What does bert learn about the structure of language? In: 57th Annual Meeting of the Association for Computational Linguistics (ACL)</article-title>
          (
          <year>July 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jebbara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Improving opinion-target extraction with character-level word embeddings</article-title>
          .
          <source>arXiv preprint arXiv:1709.06317</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jernite</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sontag</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rush</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          :
          <article-title>Character-aware neural language models</article-title>
          .
          <source>In: Thirtieth AAAI Conference on Arti cial Intelligence</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Limsopatham</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Collier</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          :
          <article-title>Bidirectional lstm for named entity recognition in twitter messages</article-title>
          .
          <source>Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>B.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>: Multi-channel bilstm-crf model for emerging named entity recognition in social media</article-title>
          .
          <source>In: Proceedings of the 3rd Workshop on Noisy User-generated Text</source>
          . pp.
          <volume>160</volume>
          {
          <issue>165</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joty</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
          </string-name>
          , H.:
          <article-title>Fine-grained opinion mining with recurrent neural networks and word embeddings</article-title>
          .
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>1433</volume>
          {
          <issue>1443</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mart nez Camara</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almeida Cruz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , D az Galiano,
          <string-name>
            <given-names>M.C.</given-names>
            ,
            <surname>Estevez-Velarde</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , Garc a Cumbreras,
          <string-name>
            <surname>M.A.</surname>
          </string-name>
          , Garc a Vega,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Montejo</surname>
          </string-name>
          <string-name>
            <surname>Raez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Montoyo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Mun~oz, R., et al.:
          <article-title>Overview of tass 2018: Opinions, health and emotions</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <volume>2172</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Montan~es, R.,
          <string-name>
            <surname>Aznar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoyo</surname>
          </string-name>
          , R.D.:
          <article-title>Aplicacion de un modelo h brido de aprendizaje profundo para el analisis de sentimiento en twitter(application of a hybrid deep learning model for sentiment analysis in twitter)</article-title>
          .
          <source>In: TASS@SEPLN (2018-09)</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Siddiqua</surname>
            ,
            <given-names>U.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chy</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aono</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Kdehateval at semeval
          <article-title>-2019 task 5: A neural network model for detecting hate speech in twitter</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>365</volume>
          {
          <issue>370</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          . In: Guyon,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.V.</given-names>
            ,
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Vishwanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pp.
          <volume>5998</volume>
          {
          <fpage>6008</fpage>
          . Curran Associates, Inc. (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>