<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/978-3</article-id>
      <title-group>
        <article-title>INGEOTEC solution for Task 1 in TASS'18 competition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniela Moctezuma</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose Ortiz-Bejar</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric S. Tellez</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabino Miranda-Jimenez</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario Gra</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CONACYT-CentroGEO</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CONACYT-INFOTEC</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>UMSNH dmoctezuma@centrogeo.edu.mx</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>jortiz@umich.mx</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>eric.tellez@infotec.mx</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>sabino.miranda@infotec.mx</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>mario.gra @infotec.mx</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>2</volume>
      <fpage>45</fpage>
      <lpage>49</lpage>
      <abstract>
        <p>The sentiment analysis over social networks determines the polarity of messages published by users. In this sense, a message can be classi ed as positive or negative, or a similar scheme using more ne-grained labels. Each language has characteristics that di cult the correct determination of the sentiment, such as the natural ambiguity of pronouns, the synonymy, and the polysemy. Additionally, given that messages in social networks are quite informal, they tend to be plagued with lexical errors and lexical variations that make di cult to determine a sentiment using traditional approaches. This paper describes our participating system in TASS'18. Our solution is composed of several subsystems independently collected and trained, combined with our EvoMSA genetic programming system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Sentiment Analysis is an active research area
that performs the computational analysis of
people's feelings or beliefs expressed in texts
such as emotions, opinions, attitudes,
appraisals, among others (Liu y Zhang, 2012).</p>
      <p>In social media, people share their opinions
and sentiments. In addition to the
inherent polarity, these feelings also have an
intensity. As in previous years, TASS'18
organizes a task related to four level polarity
classi cation in tweets. In this year, the
corpus InterTASS, has been expanded with two
more subsets, namely, a dataset containing
tweets from Costa Rica and another one
coming from Peruvian tweeters. Therefore, there
are three varieties of the Spanish language,
namely, Spain (ES), Peru (PE), and Costa
Rica (CR). Moreover, several subtasks are
also introduced:</p>
      <p>Subtask-1: Monolingual ES:
Training and test using the InterTASS ES
dataset.</p>
      <p>Copyright © 2018 by the paper's authors. Copying permitted for private and academic purposes.
Subtask-2: Monolingual PE:
Training and test using the InterTASS PE
dataset.</p>
      <p>Subtask-3: Monolingual CR:
Training and test using the InterTASS CR
dataset.</p>
      <p>Subtask-4: Cross-lingual: Here, the
training can be with a speci c dataset
and a di erent one is used to test.</p>
      <p>These subtasks are mostly based on
separating language variations in train and test
datasets. Mart nez-Camara et al. (Mart
nezCamara et al., 2018) detail TASS'18 Task 1
and their associated datasets.</p>
      <p>This paper details the Task 1 solution of
our INGEOTEC team. Our approach
consists of a number of subsystems combined
using a non-linear expression over
individual predictions using our EvoMSA genetic
programming system. It is worth to
mention that we tackle both Task 1 (this one)
and Task 4 (good or bad news) using a
similar scheme, that is, the same resources and
the same portfolio of algorithms, we also
applied the same hyper-parameters for the
algorithms; of course, we use the given task's
training set to learn and optimize for each
task.</p>
      <p>The manuscript is organized as follows.</p>
      <p>Section 2 details subsystems that compose
our solution. Section 3 presents our results,
and nally, Section 4 summarizes and
concludes this report.
2</p>
    </sec>
    <sec id="sec-2">
      <title>System Description</title>
      <p>Our participating system is a combination of
several sub-systems that tackles the polarity
categorization of the tweets independently,
and then all these independent predictions
are combined using our EvoMSA genetic
programming system. The rest of this section
details the use of these sub-systems and
resources.
2.1</p>
      <sec id="sec-2-1">
        <title>EvoMSA</title>
        <p>
          EvoMSA1 is a multilingual sentiment
analysis system based on genetic text classi ers,
domain-speci c resources, and a genetic
programming combiner of the parts. The rst
one, namely B4MSA
          <xref ref-type="bibr" rid="ref6">(Tellez et al., 2017)</xref>
          ,
performs a hyper-parameter optimization over a
large search space of possible models. It uses
1https://github.com/INGEOTEC/EvoMSA
a meta-heuristics to solve a combinatorial
optimization problem over the con guration
space; the selected model is described in
Table 1. On the second hand, EvoDAG (Gra
et al., 2016; Gra et al., 2017) is a
classier based on Genetic Programming with
semantic operators which makes the nal
prediction through a combination of all the
decision function values. The domain-speci c
resources can be also added under the same
scheme. Figure 1 shows the architecture of
EvoMSA. In the rst part, a set of di
erent classi ers are trained with datasets
provided by the contests and others resources as
additional knowledge, i.e., the idea is to be
able to integrate any other kind of related
knowledge into the model. In this case, we
used tailor-made lexicons for the
aggressiveness task: aggressiveness words and a ective
words (positive and negative), see Section 2.2
for more details. The precise con guration of
our benchmarked system is described in
Section 3.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Lexicon-based models</title>
        <p>
          To introduce extra knowledge into our
approach, we used two lexicon-based
models. The rst, Up-Down model produces a
counting of a ective words, that is, it
produces two indexes for a given text: one
for positive words, and another for negative
words. We created the positive-negative
lexicon based on the several Spanish a ective
lexicons
          <xref ref-type="bibr" rid="ref2 ref4 ref5">(de Albornoz, Plaza, y Gervas, 2012;
Sidorov et al., 2013; Perez-Rosas, Banea,
y Mihalcea, 2012)</xref>
          ; we also enriched this
lexicon with Spanish WordNet
          <xref ref-type="bibr" rid="ref3">(FernandezMontraveta, Vazquez, y Fellbaum, 2008)</xref>
          .
The other Bernoulli model was created to
predict aggressiveness using a lexicon with
aggressive words. We created this lexicon
gathering common aggressive words for
Spanish. These indexes and prediction along with
B4MSA's ( TC) outputs are the input for
EvoDAG system.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>EvoDAG</title>
        <p>EvoDAG2 (Gra et al., 2016; Gra et al.,
2017) is a Genetic Programming system
speci cally tailored to tackle classi cation
problems on very large and high dimensional
vector spaces. EvoDAG uses the principles
of Darwinian evolution to create models
represented as a directed acyclic graph (DAG).
Due to lack of space, we refer the reader to
(Gra et al., 2016) where EvoDAG is broadly
described. It is important to mention that
EvoDAG does not have information
regarding whether input Xi comes from a
particular class decision function, consequently from
EvoDAG point of view all inputs are
equiva2https://github.com/mgra g/EvoDAG</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4 FastText</title>
        <p>
          FastText (Joulin et al., 2017) is a tool to
create text classi ers and learn a semantic
vocabulary, learned from a given collection
of documents; this vocabulary is represented
with a collection of high dimensional vectors,
one per word. It is worth to mention that
FastText is robust to lexical errors since
outvocabulary words are represented as the
combination of vectors of sub-words, that is, a
kind of character q-grams limited in context
to words. Nonetheless, the main reason of
including FastText as part of our system is to
overcome the small train set that comes with
Task 4, which is ful lled using the pre-trained
vectors computed in the Spanish content of
Wikipedia
          <xref ref-type="bibr" rid="ref1">(Bojanowski et al., 2016)</xref>
          . We use
these vectors to create document vectors, one
vector per document. A document vector is,
roughly speaking, a linear combination of the
word vectors that compose the document into
a single vector of the same dimension. These
document vectors were used as input to an
SVM with a linear kernel, and we use the
decision function as input to EvoMSA.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments and results</title>
      <p>The following tables show the performance
of our system in the InterTASS dataset. We
also show the performance of a number of
selected systems to provide a context for our
solution. The following tables always show
the top-k best results that include our
system, i.e., we always show the best ones but
sometimes we do not show all results below
our system.</p>
      <p>Please recall that the InterTASS dataset
is split according to each sub-task.
Table 2 shows the performance on monolingual
datasets. For instance, the results of training
with Spain-InterTASS and testing on tweets
generated by people of Spain is shown in
Table 2a where we reached seventh position
from a total of nine participants teams. In
the case training and test corpus of other
Spanish varieties, in Table 2b and Table 2c
show the result of training with CR and PE
subsets, respectively. Our team achieved the
fourth position among eight teams in CR,
and the third one among eight participants.
Notice that all our results are marked as bold
to improve the readability.</p>
      <p>In contrary, the results of training with the
ES subset and test with subsets ES, CR, and
PE are presented in Tables 3a, 3b, and 3c,
respectively. Our team achieved the best result
in cross-lingual task with Peruvian tweets,
and also reached the second best results in
ES (Spain) and CR (Costa Rica) subsets.</p>
      <p>The performance of our method in cross
lingual tasks 4 is shown in Table 3. For
instance, Table 3a shows our performance on
the ES subset; here, we achieved the second
position among three teams. In general, the
number of participants was smaller than the
monolingual tasks. Table 3b show the rank of
the four participant teams over the Peruvian
subset of the test, here we reached the best
position on the Macro-F1 score. Finally, we
reached the second rank on the Costa Rica
subset, just below of RETUYT-InCo.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>It is worth to mention that we used the same
scheme, explained in Section 2, to tackle all
subtasks. Note that our EvoMSA allow to
change the training set as speci ed for each
subtasks, so we can optimize the pipeline for
each particular objective.</p>
      <p>Regarding the obtained results, our
approach performs better when it is trained
with tweets from Spain and test with other
Spanish varieties. However, it is not clear if
this performance is due to the data or a
in</p>
      <sec id="sec-4-1">
        <title>RETUYT-InCo ELiRF-UPV Atalaya</title>
        <p>INGEOTEC
MEFaMAF</p>
        <p>ABBOT</p>
      </sec>
      <sec id="sec-4-2">
        <title>RETUYT-InCo Atalaya</title>
        <p>INGEOTEC
ELiRF-UPV
UNSA dajo
herent feature of the Spanish variation.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The authors would like to thank
Laboratorio Nacional de GeoInteligencia for partially
funding this work.
0.612
0.549
0.544
0.6
0.6
0.55
0.53
0.482
0.433
0.537
0.561
0.582
0.522
0.512
0.46</p>
      <p>(b) Peruvian variation (PE).
(c) Costa Rica's variation (CR).
wordnet 3.0. Text Resources and Lexical
Knowledge. Mouton de Gruyter, paginas
175{182.
0.471
0.445
0.441
0.447
0.445
0.438
0.367
Liu, B. y L. Zhang, 2012. A Survey of
Opinion Mining and Sentiment Analysis,
paginas 415{463. Springer US, Boston,
MA.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , E. Grave,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Joulin, y
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>arXiv preprint arXiv:1607</source>
          .
          <fpage>04606</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>de Albornoz</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          , L. Plaza,
          <string-name>
            <given-names>y P.</given-names>
            <surname>Gervas</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Sentisense: An easily scalable concept-based a ective lexicon for sentiment analysis</article-title>
          .
          <source>En Proceedings of LREC</source>
          <year>2012</year>
          , paginas
          <volume>3562</volume>
          {
          <fpage>3567</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Fernandez-Montraveta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , G. Vazquez, y
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>The spanish version of Mart nez-</article-title>
          <string-name>
            <surname>Camara</surname>
            , E.,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Almeida-Cruz</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          <article-title>D az-</article-title>
          <string-name>
            <surname>Galiano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Estevez-Velarde</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Garc</surname>
            a-Cumbreras, M. Garc aVega,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>A. Montejo</given-names>
          </string-name>
          <string-name>
            <surname>Raez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montoyo</surname>
          </string-name>
          , R. Mun~oz, A. PiadMor s, y J.
          <string-name>
            <surname>Villena-Roman</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of TASS 2018: Opinions, health and emotions</article-title>
          . En E. Mart nezCamara
          <string-name>
            <given-names>Y.</given-names>
            <surname>Almeida-Cruz M. C. D azGaliano S. Estevez-Velarde M. A. Garc aCumbreras M. Garc</surname>
          </string-name>
          a-Vega
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gutierrez A. Montejo Raez A. Montoyo R. Mun</surname>
          </string-name>
          <article-title>~oz A. Piad-Mor s</article-title>
          , y J.
          <article-title>Villena-Roman, editores</article-title>
          ,
          <source>Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS</source>
          <year>2018</year>
          ), volumen 2172 de CEUR Workshop Proceedings, Sevilla, Spain, September. CEUR-WS.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Perez-Rosas</surname>
            , V., C. Banea, y
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Mihalcea</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Learning sentiment lexicons in spanish</article-title>
          .
          <source>En LREC</source>
          , volumen
          <volume>12</volume>
          , pagina 73.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Sidorov</surname>
            , G.,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Miranda-Jimenez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>ViverosJimenez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gelbukh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Castro-Sanchez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Velasquez</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>D az-</article-title>
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>S. SuarezGuerra</given-names>
          </string-name>
          , A. Trevin~o,
          <string-name>
            <given-names>y J.</given-names>
            <surname>Gordon</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Empirical study of machine learning based approach for opinion mining in tweets</article-title>
          .
          <source>En Proceedings of the 11th Mexican International Conference on Advances in Articial Intelligence - Volume Part I, MICAI'12, paginas</source>
          <volume>1</volume>
          {
          <fpage>14</fpage>
          , Berlin, Heidelberg. Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Tellez</surname>
            ,
            <given-names>E. S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Miranda-Jimenez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moctezuma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Suarez</surname>
          </string-name>
          , y
          <string-name>
            <given-names>O. S.</given-names>
            <surname>Siordia</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A simple approach to multilingual polarity classi cation in Twitter</article-title>
          .
          <source>Pattern Recognition Letters</source>
          ,
          <volume>94</volume>
          :
          <fpage>68</fpage>
          {
          <fpage>74</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>