<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Franco Mart n Luque</string-name>
          <email>francolq@famaf.unc.edu.ar</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Nacional de Cordoba &amp; CONICET</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>561</fpage>
      <lpage>570</lpage>
      <abstract>
        <p>In this article we describe our participation in TASS 2019, a shared task aimed at the detection of sentiment polarity of Spanish tweets. We combined di erent representations such as bag-of-words, bagof-characters, and tweet embeddings. In particular, we trained robust subword-aware word embeddings and computed tweet representations using a weighted-averaging strategy. We also used two data augmentation techniques to deal with data scarcity: two-way translation augmentation, and instance crossover augmentation, a novel technique that generates new instances by combining halves of tweets. In experiments, we trained linear classi ers and ensemble models, obtaining highly competitive results despite the simplicity of our approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>Sentiment Analysis</kwd>
        <kwd>Polarity Classi cation</kwd>
        <kwd>Data Augmentation</kwd>
        <kwd>Linear Models</kwd>
        <kwd>Embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        TASS is a shared task organized every year, since 2012, with challenges related
to Sentiment Analysis in Spanish. In TASS 2019 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the proposed task is to
label tweets according to the general sentiment polarity they express, classifying
them into four classes: P (positive), N (negative), NEU (neutral, undecided) and
NONE (no sentiment).
      </p>
      <p>Five datasets are o ered for the task, each one from a di erent Spanish
speaking country: CR (Costa Rica), ES (Spain), MX (Mexico), PE (Peru) and
UY (Uruguay). Each corpus is divided into train, development and test sections.
No other supervised datasets can be used, but external linguistic resources such
as embeddings and lexicons are allowed.</p>
      <p>The challenge is divided into two subtasks. In monolingual subtask 1, systems
must be trained and tested on the same dataset. In cross-lingual subtask 2,</p>
      <p>Training</p>
      <p>Data
Translation
Augmented</p>
      <p>Data
Crossover
Augmented</p>
      <p>Data</p>
      <p>BoC</p>
      <p>BoW</p>
      <p>Embedding
Preprocess</p>
      <p>Classifier
systems must be trained using datasets from countries others than the one used
for testing.</p>
      <p>
        In this article, we describe our participation in TASS 2019 as team Atalaya.
We based our systems on our previous work [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for TASS 2018 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For this
edition, we focused our work on data augmentation and robust representations.
      </p>
      <p>
        To represent tweets, we used a combined approach of bag-of-words,
bag-ofcharacters and tweet embeddings. Tweet embeddings were computed from word
embeddings using a weighted averaging scheme. For word embeddings, we used
fastText subword-aware vectors [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] speci cally trained for sentiment analysis
over Spanish tweets.
      </p>
      <p>Our fastText embeddings are robust to noise since they can compute
embeddings for unseen words by using subword embeddings. Moreover, we trained
them using a database of 90M tweets from various Spanish-speaking countries,
giving wide domain-speci c vocabulary coverage. We achieved additional
robustness by doing preprocessing using several text normalization and noise reduction
techniques.</p>
      <p>To cope with training data scarcity, we experimented with data
augmentation techniques. As in our previous work, we did augmentation using machine
translation to and from several other languages.</p>
      <p>We also tried a novel augmentation technique we called instance crossover,
loosely inspired by the crossover operation from genetic algorithms. This
technique combines halves of tweets to generate new instances. Despite its simplicity,
this idea showed to be useful in our experiments.</p>
      <p>For the classifying models, we used logistic regressions and also bagging
ensembles of logistic regressions.</p>
      <p>The rest of the paper is as follows. The next section presents the components
of our systems and the ideas we used to build them. Section 3 presents the
experiments and results for both subtasks. Section 4 concludes the work with
some observations about our experience.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Techniques and Resources</title>
      <p>The components and general architecture of our systems is shown in Fig. 1. In
this section, we describe the techniques and resources we used to build them.
2.1</p>
      <sec id="sec-2-1">
        <title>Preprocessing</title>
        <p>Preprocessing is important to reduce noise from tweets. We follow our previous
work, applying two levels of preprocessing. Basic tweet preprocessing includes
tokenization, replacement of handles, URLs, and e-mails, and shortening of
repeated letters. Further preprocessing is done, aimed at semantic tasks. It includes
removal of punctuation, stopword and numbers, lowercasing, lemmatization, and
negation handling.</p>
        <p>
          For negation handling, we followed a simple approach [
          <xref ref-type="bibr" rid="ref4 ref8">4, 8</xref>
          ]: We nd negation
words and add the pre x NOT to the following tokens. Up to three tokens are
negated, or less if a non-word token is found.
        </p>
        <p>No treatment was performed to hashtags, emojis, interjections and
onomatopoeias. Moreover, no spelling correction nor any other additional
normalization was applied.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Bags of Words and Characters</title>
        <p>A simple way to represent textual data as feature vectors is to use bag-of-words
(BoWs). A bag-of-words represents a tweet as a vector with the counts of words
occurring in it. Resulting vectors are high-dimensional and sparse. The BoW
representation can be extended to count also word n-grams. In this work, we
used BoWs, together with count binarization and TF-IDF re-weighting, both
useful for semantic tasks such as sentiment analysis.</p>
        <p>For more robustness, we also used a bag-of-characters (BoC) representation.
BoCs have exactly the same properties and variants than BoWs but are applied
to characters instead of word tokens. Character usage in tweets holds useful
information for sentiment analysis. In our work, the BoC representation is computed
over the original raw text of tweets, with no preprocessing at all.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Word Embeddings</title>
        <p>A more interesting way to represent text is using embeddings. Word embeddings
are low-dimensional dense vector representations of words. These representations
are learned in an unsupervised fashion using large quantities of plain text,
providing high vocabulary coverage.</p>
        <p>
          For our systems, we used fastText embeddings [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], that introduces additional
robustness by learning also subword-level embeddings and using them to
compute vectors for unseen words. With subword-aware embeddings, the need for
normalization of highly noisy text in preprocessing is greatly alleviated.
        </p>
        <p>We did not use a pretrained fastText model but trained our own using a
big preprocessed dataset of 90 million tweets from various Spanish-speaking
countries. This dataset is mostly composed of tweets we collected for previous
work, and also includes the tweets from all sections of all TASS 2019 datasets.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Tweet Embeddings</title>
        <p>To use word embeddings in sentiment analysis, the embeddings of the individual
tokens must be aggregated in some way to obtain a complete tweet
representation.</p>
        <p>A simple approach is to do averaging to obtain a single vector. A bit more
interesting is to add weights to the averaging scheme. This way, some words may
be considered more relevant than others for the classi cation task.</p>
        <p>
          In this work, we used Smooth Inverse Frequency (SIF), a simple weighted
averaging scheme from [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] inspired by TF-IDF re-weighting. In SIF, words w are
a
weighted with a+p(w) , where p(w) is the word unigram probability, and a &gt; 0
is a smoothing hyperparameter. Big values of a mean more smoothing towards
plain averaging. We model the unigram probability using unigram counts from
our preprocessed 90 million tweets dataset.
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] a nal transformation is applied to tweet embeddings by subtracting
from them a common component shared by all the vectors. Preliminary
experiments with this idea, however, showed it to be harmful to our systems, so we
did not use it in our nal experiments.
        </p>
        <p>An important limitation of this tweet embedding scheme is that word order
is completely ignored. Only preprocessing may allow the in uence of ordering in
the result. In particular, the negation handling trick from section 2.1 is a useful,
although naive, way to let words be a ected by previous negations.
2.5</p>
      </sec>
      <sec id="sec-2-5">
        <title>Data Augmentation with Two-Way Translation</title>
        <p>One of the main successful approaches from our previous work on TASS was
the use of data augmentation techniques. Data augmentation helps to cope with
training data scarcity. Augmentation aims at the introduction of data variability
using label-preserving transformations on real data. When correctly used, it
contributes to data robustness and acts as a regularizer for the models.</p>
        <p>Our approach for TASS 2018 (also as team Atalaya) was to use two-way
translation augmentation. In two-way translation, an external machine
translation service is used to translate tweets to other \pivot" languages and then
back to Spanish. This augmentation technique helps to introduce lexical and
syntactical variations to tweets, most times preserving their meaning.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] we used two-way translation to augment the training data using four
pivot languages (English, French, Portuguese and Arabic). This augmentation
was found to be useful for the ES and CR datasets, but not for PE.
        </p>
        <p>In this work, we explored two-way translation further, applying it to all the
datasets using 20 di erent pivot languages. To get translations, we used Google's
Cloud Translation API service.</p>
        <p>Pivot languages were selected by hand from the list of available languages
that the API can translate from/to Spanish. The selection was done trying to
pick representative languages from di erent language families.
2.6</p>
      </sec>
      <sec id="sec-2-6">
        <title>Data Augmentation with Instance Crossover</title>
        <p>We also tried a new augmentation idea that aims at the generation of new data by
combining pairs of instances with the same label. We call it the instance crossover
augmentation technique, inspired by the chromosome crossover operation from
genetic algorithms.</p>
        <p>Our approach is simply to split tokenized tweets into two halves, and then
randomly sample and combine rst halves with second halves. Resulting
instances will probably be ungrammatical and semantically unsound, but our
hypothesis is that what is left of semantics, for instance at the lexical level, will
preserve sentiment polarity most of the times.1</p>
        <p>Fig. 2 shows an example of instance crossover using two tweets with positive
sentiment. In this example, crossover is successful in the sense that the resulting
instances can be clearly judged as having a positive sentiment. In other cases,
crossover may fail to preserve polarity, for instance, because of an unfortunate
combination involving a negation. Resulting instances may even be completely
nonsensical, introducing noise to the data.</p>
        <p>For this work, we chose to directly validate in experiments this augmentation
idea. In our experiments, we applied augmentation over the training tweets after
basic preprocessing and before semantic preprocessing (as de ned in section 2.1).
We tried di erent levels of augmentation, multiplying the size of original training
datasets by factors of 4, 8, 12, 16 and 20. We preserved the original distribution
over labels and therefore the class imbalance.</p>
        <p>Instance crossover is a very rough and naive augmentation technique.
However, it may be useful to introduce more data variability than two-way
translation. With translation, new data points may fall very close to the original ones,
while crossover introduces new points in the \spaces" between the original ones.
Moreover, this is done in a representation agnostic fashion. It can be used with
bag-of-words, embeddings, or even neural based representations.</p>
        <p>Another clear advantage of instance crossover is that it does not rely on any
external resource or system. Unlike this, translation requires an external service,
at a cost, and other techniques such as synonym replacement require thesauruses
or word similarity databases.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>
        In this section, we describe our experiments. We implemented all our systems
using scikit-learn [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In the preprocessing stage, we used an NLTK-based tokenizer
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and TreeTagger for lemmatization [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
3.1
      </p>
      <sec id="sec-3-1">
        <title>System Development</title>
        <p>For simplicity, most of our work was centered on subtask 1 and on the ES dataset,
looking for model con gurations and hyperparameter values that gave the best
1 Grammaticality and semantic soundness are already rare in the original tweets, so
it is not something we should worry about very much.</p>
        <p>ORIGINAL
@USER fue genial debemos
organizar mas cosas as
sin necesidad de que
nadie abandone el pa s
@USER me alegro mucho ! !
es importante darnos cuenta
del gran valor que podemos
aportar y encontrar nuestra mision</p>
        <p>AUGMENTED
@USER fue genial debemos</p>
        <p>organizar mas cosas as
del gran valor que podemos
aportar y encontrar nuestra mision
@USER me alegro mucho ! !
es importante darnos cuenta
sin necesidad de que
nadie abandone el pa s
results over the development section of the ES dataset. The optimization process
was done using a mixed approach of grid search and by-hand tuning.</p>
        <p>We targeted the maximization of both macro-F1 and accuracy scores. The
macro-F1 score is the main metric of TASS 2019, but it is very unstable and
sensible to small changes in predictions for minority classes. On the other hand,
accuracy is more stable and reliable for the development process.</p>
        <p>
          As a starting point, we used our optimal model and con guration from TASS
2018 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This model is a logistic regression over a combination of bag-of-words
(BoW), bag-of-characters (BoC) and tweet embeddings as follows:
{ Augmentation: Two-way translation with English, French, Portuguese and
        </p>
        <p>Arabic (EN+FR+PT+AR) as pivot languages.
{ BoW: All word n-grams for n 5.
{ BoC: All character n-grams for n 6.
{ Tweet embeddings: 50 dimension fastText vectors. Weighted averaging with
a = 0:1.
{ Logistic regression: liblinear solver with primal formulation, L2
regularization with inverse strength C = 1:0, and class-balanced reweighting.
Table 1 shows a detailed evaluation of this baseline model.</p>
        <p>The rst idea we explored was augmentation using two-way translation with
the new 20 pivot languages. We tried adding all new data, but also adding some
subsets of it, by grouping pivot languages in packs of four datasets. However, we
could not nd any improvement from the original EN+FR+PT+AR
augmentation, sometimes nding important drops in model quality.</p>
        <p>Next, we explored augmentation using instance crossover. We tried 4x, 8x,
12x, 16x and 20x factor augmentations from the original ES training corpus, with
and without additional translation-based augmentation. In every case results
were improved w.r.t. not using crossover augmentation. The best result was
found for 8x augmentation.</p>
        <p>Last, we tuned the hyperparameters of the logistic regression. The best
conguration found was a liblinear solver with primal formulation, L2 regularization
with inverse strength C = 0:2, with no class-balanced reweighting.</p>
        <p>We also tried an ensemble of logistic regressions by using bagging. Bagging
was found to be useful for the ES dataset. The best con guration found was
using a bag of 40 logistic regressions.</p>
        <p>Table 2 shows a detailed evaluation for our best model for ES found following
the development process.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Subtask 1: Monolingual Experiments</title>
        <p>To build a submission for subtask 1, we rst ran the nal test on the ES dataset
using the best model described in the previous section.</p>
        <p>To build submissions for the other datasets, we followed a similar
development approach, but with most hyperparameters xed with the optimal values
for ES. We focused the optimization process in the usage of translation and
crossover augmentations, in the logistic regression hyperparameters and in the
usage of bagging. Tuning was done mostly by hand and sometimes using
gridsearch. Table 3 shows the optimal con gurations found for each dataset.</p>
        <p>The nal results for the complete submission for subtask 1 are shown in
Table 4.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Subtask 2: Crosslingual Experiments</title>
        <p>To build submissions for subtask 2 we did a minimal set of experiments. For each
language, we started from the optimal model con guration found for subtask 1
and then trained it using the union of the training datasets of every other
language. We then proceeded to optimize the main hyperparameters of the logistic
regression, mostly doing by-hand tuning.</p>
        <p>We did some preliminary experiments with data augmentation and bagging
for the ES dataset. However, results were not improved, so we didn't do further
experimentation with the other datasets.</p>
        <p>augmentation
translation crossover
logistic regression</p>
        <p>C class-weight bagging
ES EN+FR+PT+AR
PE EN+FR+PT+AR
CR no
UY no
MX EN+FR+PT+AR
As a complementary post-competition experiment, we performed ablation tests
for each of the components of our systems, to assess the relevance of each of the
techniques used in this work.</p>
        <p>The ablation tests were done using the best system for subtask 1 on the ES
dataset. The results are displayed in Table 6.</p>
        <p>It can be seen that all the techniques have a positive impact. Among
representations, tweet embedding is the most important representation, way above
BoW and BoC representations. Also, it is interesting to observe that crossover
augmentation has an impact on the F1 but not on the accuracy, indicating that
it is helping mostly on the minority classes NEU and NONE.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>Robust representations and data augmentation play a strong role in sentiment
analysis with small-sized training datasets. As in our previous experience with
TASS 2018, we are still able to obtain top ranking results without having to
resort to complex models such as deep neural networks.</p>
      <p>We observe that, for this edition of TASS, most of our work was on the
application of general ML techniques, and not on particular task/domain speci c
engineering. In particular, we successfully tried instance crossover augmentation,
a novel technique that, despite its simplicity, showed a positive impact on
results. This idea can be useful to augment small datasets for other short text
classi cation tasks without the need for external resources.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , T.:
          <article-title>A simple but tough-to-beat baseline for sentence embeddings</article-title>
          .
          <source>In: International Conference on Learning Representations</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
          </string-name>
          , E.:
          <article-title>Nltk: the natural language toolkit</article-title>
          .
          <source>In: Proceedings of the ACL 2004 on Interactive poster and demonstration sessions</source>
          . p.
          <fpage>31</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>arXiv preprint arXiv:1607.04606</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>T.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brooks</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>shee Chan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leinweber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <article-title>Martinez-jerez,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Raghubir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Rajagopalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Ranade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Rubinstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Tufano</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Yahoo! for amazon: Sentiment extraction from small talk on the web</article-title>
          .
          <source>In: 8th Asia Paci c Finance Association Annual Conference</source>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D</given-names>
            <surname>az-Galiano</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.C.</surname>
          </string-name>
          , et al.:
          <article-title>Overview of TASS 2019</article-title>
          .
          <article-title>CEUR-WS, Bilbao</article-title>
          , Spain (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Luque</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          : Atalaya at TASS 2018:
          <article-title>Sentiment analysis with tweet embeddings and data augmentation</article-title>
          . In:
          <article-title>Mart nez-</article-title>
          <string-name>
            <surname>Camara</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almeida Cruz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <article-title>D az-</article-title>
          <string-name>
            <surname>Galiano</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Estevez Velarde</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
            a-Cumbreras,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
            a-Vega,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Gutierrez</given-names>
            <surname>Vazquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Montejo</surname>
          </string-name>
          <string-name>
            <surname>Raez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Montoyo</surname>
          </string-name>
          <string-name>
            <surname>Guijarro</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Mun~oz Guillena,
          <string-name>
            <surname>R.</surname>
          </string-name>
          , Piad Mor s,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Villena-Roman</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS</source>
          <year>2018</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2172</volume>
          , pp.
          <volume>29</volume>
          {
          <fpage>35</fpage>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          , Sevilla, Spain (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mart</surname>
            nez-Camara,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almeida Cruz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <article-title>D az-</article-title>
          <string-name>
            <surname>Galiano</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Estevez Velarde</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
            a-Cumbreras,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
            a-Vega,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Gutierrez</given-names>
            <surname>Vazquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Montejo</surname>
          </string-name>
          <string-name>
            <surname>Raez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Montoyo</surname>
          </string-name>
          <string-name>
            <surname>Guijarro</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Mun~oz Guillena,
          <string-name>
            <surname>R.</surname>
          </string-name>
          , Piad Mor s,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Villena-Roman</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          :
          <source>Overview of TASS</source>
          <year>2018</year>
          :
          <article-title>Opinions, health and emotions</article-title>
          . In:
          <article-title>Mart nez-</article-title>
          <string-name>
            <surname>Camara</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almeida Cruz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <article-title>D az-</article-title>
          <string-name>
            <surname>Galiano</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Estevez Velarde</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
            a-Cumbreras,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
            a-Vega,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Gutierrez</given-names>
            <surname>Vazquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Montejo</surname>
          </string-name>
          <string-name>
            <surname>Raez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Montoyo</surname>
          </string-name>
          <string-name>
            <surname>Guijarro</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Mun~oz Guillena,
          <string-name>
            <surname>R.</surname>
          </string-name>
          , Piad Mor s,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Villena-Roman</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS</source>
          <year>2018</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2172</volume>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          , Sevilla, Spain (
          <year>September 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaithyanathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Thumbs up? sentiment classi cation using machine learning techniques</article-title>
          .
          <source>In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>79</volume>
          {
          <fpage>86</fpage>
          .
          <article-title>Association for Computational Linguistics</article-title>
          (
          <year>July 2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <volume>2825</volume>
          {
          <fpage>2830</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Schmid</surname>
          </string-name>
          , H.:
          <article-title>Improvements in part-of-speech tagging with an application to german</article-title>
          .
          <source>In: Proceedings of the ACL SIGDAT-Workshop</source>
          . pp.
          <volume>47</volume>
          {
          <issue>50</issue>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>