<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transformers Pipeline for O ensiveness Detection in Mexican Spanish Social Media</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victor Gomez-Espinosa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victor Mun~iz-Sanchez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrian Pastor Lopez-Monroy</string-name>
          <email>pastor.lopezg@cimat.mx</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Mathematics Research Center (CIMAT)</institution>
          ,
          <addr-line>Guanajuato, 36023</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Mathematics Research Center (CIMAT)</institution>
          ,
          <addr-line>Monterrey, 66628</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe the methodology proposed for participating in the MeO endEs@IberLEF 2021 competition for the Subtask 3: Non-contextual binary classi cation for Mexican Spanish, which consists in the classi cation of tweets as o ensive or non-o ensive. We proposed a Transformers-based pipeline, consisting on a series of preprocessing steps and the use of an extended corpus, followed by an ensemble of BERT models. The proposed strategy obtained the best results on this task by ranking rst place.</p>
      </abstract>
      <kwd-group>
        <kwd>O ensiveness Detection</kwd>
        <kwd>Mexican Spanish</kwd>
        <kwd>Transformers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In the last years, there have been many initiatives in the NLP and Machine
Learning community, to guide research e orts towards solutions in the automatic
detection of threats and risks to the users of social networks. Those threats
include aggressiveness, hate speech, harassment, racism, misogyny, among many
others. For spanish language, those e orts have been promoted by academic
competitions in speci c tasks, such as the events organized by TASS [
        <xref ref-type="bibr" rid="ref10 ref6">6,10</xref>
        ], PAN
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and particularly, MEX-A3T@IberLEF [
        <xref ref-type="bibr" rid="ref1 ref2 ref7">1,2,7</xref>
        ] and MeO endEs@IberLEF [
        <xref ref-type="bibr" rid="ref12 ref13">12,
13</xref>
        ], which includes a track for aggresiveness and o ensiveness identi cation task
respectively for tweets in Mexican spanish.
      </p>
      <p>Detection of o ensive comments or posts in social media is not easy, because
it is not depending on the presence or absence of speci c words. As an example,
consider the next tweets taken from MEX-A3T 2020 training corpus:
"No se si guardar dinero para salir contigo o gastarlo en
pendejadas a la verga"
"Ya no saben que verga decir, consigan una vida y sufran o algo"</p>
      <p>In the rst case, the tweet is non-o ensive, even when it contains vulgar and rude
language. The second tweet is o ensive, although the language is less vulgar than
the rst one. Based on that, we argue that it is necessary to take into account
the context in which words are used.</p>
      <p>
        There are many proposals to tackle o ensive and aggressive content detection
in social media for spanish language, with document representations based on
n-grams (word and character level), and word embeddings, with classi ers based
on standard machine learning and deep learning approaches [
        <xref ref-type="bibr" rid="ref1 ref3 ref7">1, 3, 7</xref>
        ]. However, in
the last year, there is a visible trend in the use of contextualized representations
of words, such as Bi-LSTM, Bi-GRU, and Transformers-based models, such as
BETO [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] with and without ne tunning [
        <xref ref-type="bibr" rid="ref16 ref17 ref2 ref4">2, 4, 16, 17</xref>
        ]. State of the art results for
Mexican spanish on this task has been reached with a bagging-like scheme, by
combining di erent BERT models trained on di erent augmented datasets [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Similar to [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], we propose an ensemble of BERT models, but also, we use
a pre-processing step in order to obtain valuable text descriptions of specialized
language used in tweets, followed by an extension of the training corpus. The
empirical evaluation shows that the proposed approach obtained the best results
in the challenge by ranking rst place. In the following sections, our proposal is
explained in detail.
      </p>
      <p>This document is organized as follows: Section 2 describes the dataset and
the experimental settings. Section 3 describes the proposed pipeline, and Section
3.4 the experimental results. Finally, Section 4 outlines the conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Dataset and model settings</title>
      <p>O endMex corpus consists of a training set of 5060 tweets and a validation set
of 76 tweets, from the total, 80% was used for training and 20% for evaluation
purposes. The dataset has a length mean of 24.11 and a maximum of 60 tokens,
with an unbalanced ratio of 2.66 and the o ensive class as the minority.</p>
      <p>
        For this task, a pre-trained BERT model on Spanish was used [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and for the
ne-tuning step for small datasets (less than 100,000), we used the exhaustive
search over the recommended hyperparameters [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and we choose the best one
on the evaluation set. As a result, we used a BERT model with a training batch
size of 16 for 4 epochs, and an Adam optimizer with a learning rate of 3e-5.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Pipeline</title>
      <p>In this section, the proposed pipeline is described: the corpus pre-processing
step, second, the extended corpus step, and nally, the BERT ensemble step.
3.1</p>
      <p>
        Step 1: Pre-processing
From other classi cation tasks such as irony or sentiment classi cation has been
proved that adding the tweet jargon like hashtags, emojis, and emoticons as text
descriptions improves tweet classi cation tasks through deep learning models like
BERT [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ].
      </p>
      <p>Our procedure is the following:
{ Hashtags are split into words (see Figure 1) using the python word ninja
library (https://github.com/keredson/wordninja) with a Spanish dictionary
made with the Spanish fasttext vocabulary
(https://fasttext.cc/docs/en/crawlvectors.html) .
{ Emojis are replaced with their text meaning in Spanish (see Figure 1) given
by the python library emoji (https://github.com/carpedm20/emoji/).
{ Emoticons are replaced with a text representation in Spanish similar to the
emojis meanings (see Table 1).
{ Words out of vocabulary are replaced with the corresponding words (see
Table 1).</p>
      <p>
        After pre-processing step, the maximum corpus length increases twice (see
Figure 2); this is the reason why the maximum sequence length of BERT model
is 128.
It was demonstrated [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] that increasing the training corpus examples with other
corpus labeled with a related task such as hate speech could improve model
performance. In this pipeline, we choose to use hate speech and negative sentiment
from the HatEval 2019 in Spanish [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and TASS 2019 for Mexican Spanish [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
corpus, respectively.
      </p>
      <p>
        The methodology proposed to add examples from other corpus as a way to
improve model performance and reduce the unbalanced ratio is shown in Figure
3, and consists on the following three steps. In the rst step the corpus must be
preprocessed by the method described in section 3.1, the second step consists
on training with the O endMex 2021 corpus by the method described in section
3.3, and make inference on the HatEval and TASS corpus, and then, we select
only those examples whose weights in the classi er are greater or equal to 0.95,
which are added to the O endMex corpus as o ensive examples. Finally, the
step three consists of training from scratch the model again. The intuitive idea
of this step is to augment the training data only with those instances that could
improve the classi cation score.
In order to alleviate BERT instability of ne-tuning on small samples and
unbalanced datasets, it was shown [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] that using single BERTs as weak models
and through an ensemble of 20 BERTs and a weighted voting scheme, which
means that accumulating the softmax layer outputs and selecting the class with
the maximum weight makes a more robust model (see Figure 4).
After following the proposed pipeline described in Section 3, we obtained the
results shown in Table 2, where it can be seen that each step on the proposed
pipeline helps to improve the model performance, as we expected. The best result
achieved a F1 score on the evaluation set of 71.07 with a 20 BERT ensemble,
pre-processing, and nally adding more examples to the training corpus.
This work presented a pipeline of three steps for o ensiveness detection on
Mexican Spanish social media that e ectively achieved rst place on the
MeOffendEs@IberLEF 2021 subtask 3 competition with a F1 score of 0.7026 on the
test set. Our experimental results on the evaluation set shown that each step on
the pipeline improves the model performance. We thought this pipeline could be
implemented quickly and successfully in other related tasks such as
aggressiveness detection.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements References</title>
      <p>Gomez-Espinosa thanks CONACYT for the scholarship for Master degree
studies with number: 1002761.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carmona</surname>
            ,
            <given-names>M.A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gomez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalante</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pineda</surname>
            ,
            <given-names>L.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moctezuma</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Overview of MEX-A3T at iberlef 2019: Authorship and aggressiveness analysis in mexican spanish tweets</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum co-located with 35th Conference of the Spanish Society for Natural Language Processing</source>
          ,
          <source>IberLEF@SEPLN</source>
          <year>2019</year>
          , Bilbao, Spain,
          <year>September 24th</year>
          ,
          <year>2019</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2421</volume>
          , pp.
          <volume>478</volume>
          {
          <fpage>494</fpage>
          . CEURWS.org (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jarqu</surname>
            n-Vasquez,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gomez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalante</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pineda</surname>
            ,
            <given-names>L.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez-Adorno</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Posadas-Duran</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bel-Enguix</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of MEX-A3T at iberlef 2020: Fake news and aggressiveness analysis in mexican spanish</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          )
          <article-title>co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2020</year>
          ), Malaga, Spain,
          <year>September 23th</year>
          ,
          <year>2020</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2664</volume>
          , pp.
          <volume>222</volume>
          {
          <fpage>235</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Monroy</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          :
          <article-title>Author pro ling and aggressiveness detection in spanish tweets: MEX-A3T 2018</article-title>
          .
          <source>In: Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          )
          <article-title>co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2018</year>
          ), Sevilla, Spain,
          <year>September 18th</year>
          ,
          <year>2018</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2150</volume>
          , pp.
          <volume>134</volume>
          {
          <fpage>139</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fersini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Rangel</given-names>
            <surname>Pardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.M.</given-names>
            ,
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Sanguinetti</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>54</volume>
          {
          <fpage>63</fpage>
          . Association for Computational Linguistics, Minneapolis, Minnesota, USA (Jun
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bevendor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chulvi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarracen</surname>
            ,
            <given-names>G.L.D.L.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manjavacas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mayerl</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolska</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zangerle</surname>
          </string-name>
          , E.: Overview of pan 2021:
          <article-title>Authorship veri cation, pro ling hate speech spreaders on twitter, and style change detection</article-title>
          . In: Hiemstra,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Moens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.F.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Perego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Sebastiani</surname>
          </string-name>
          ,
          <string-name>
            <surname>F</surname>
          </string-name>
          . (eds.) Advances in Information Retrieval. pp.
          <volume>567</volume>
          {
          <fpage>573</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Camara</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almeida-Cruz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <article-title>D az-</article-title>
          <string-name>
            <surname>Galiano</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Estevez-Velarde</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cumbreras</surname>
            ,
            <given-names>M.A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vega</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montejo-Raez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montoyo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Mun~oz, R.,
          <string-name>
            <surname>Piad-Mor s</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villena-Roman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <source>Overview of TASS</source>
          <year>2018</year>
          :
          <article-title>Opinions, health and emotions</article-title>
          .
          <source>In: Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN, TASS@SEPLN</source>
          <year>2018</year>
          ,
          <article-title>co-located with 34nd SEPLN Conference (SEPLN</article-title>
          <year>2018</year>
          ), Sevilla, Spain,
          <year>September 18th</year>
          ,
          <year>2018</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2172</volume>
          , pp.
          <volume>13</volume>
          {
          <fpage>27</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Carmona</surname>
            ,
            <given-names>M.A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guzman-Falcon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gomez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalante</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pineda</surname>
            ,
            <given-names>L.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reyes-Meza</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sulayes</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          :
          <article-title>Overview of MEX-A3T at ibereval 2018: Authorship and aggressiveness analysis in mexican spanish tweets</article-title>
          .
          <source>In: Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          )
          <article-title>co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2018</year>
          ), Sevilla, Spain,
          <year>September 18th</year>
          ,
          <year>2018</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2150</volume>
          , pp.
          <volume>74</volume>
          {
          <fpage>96</fpage>
          . CEURWS.org (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Can~ete, J.,
          <string-name>
            <surname>Chaperon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuentes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
          </string-name>
          , J.:
          <article-title>Spanish pretrained bert model and evaluation data</article-title>
          .
          <source>In: PML4DC at ICLR</source>
          <year>2020</year>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <fpage>4186</fpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <article-title>D az-</article-title>
          <string-name>
            <surname>Galiano</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vega</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casasola</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cumbreras</surname>
            ,
            <given-names>M.A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camara</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moctezuma</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montejo-Raez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cabezudo</surname>
            ,
            <given-names>M.A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tellez</surname>
            ,
            <given-names>E.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miranda-Jimenez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Overview of TASS 2019:
          <article-title>One more further for the global spanish sentiment analysis corpus</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum co-located with 35th Conference of the Spanish Society for Natural Language Processing</source>
          ,
          <source>IberLEF@SEPLN</source>
          <year>2019</year>
          , Bilbao, Spain,
          <year>September 24th</year>
          ,
          <year>2019</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2421</volume>
          , pp.
          <volume>550</volume>
          {
          <fpage>560</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Guzman-Silverio</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balderas-Paredes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Monroy</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          :
          <article-title>Transformers and data augmentation for aggressiveness detection in mexican spanish</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          )
          <article-title>co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2020</year>
          ), Malaga, Spain,
          <year>September 23th</year>
          ,
          <year>2020</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2664</volume>
          , pp.
          <volume>293</volume>
          {
          <fpage>302</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Montes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez-Carmona</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez Mellado</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            <given-names>Adorno</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Jimenez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Plaza-de Arco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.M.</given-names>
            ,
            <surname>Taule</surname>
          </string-name>
          , M. (eds.):
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2021</year>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <article-title>Plaza-del-</article-title>
          <string-name>
            <surname>Arco</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casavantes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalante</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin-Valdivia</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montejo-Raez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gomez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jarqu</surname>
            n-Vasquez,
            <given-names>H.</given-names>
          </string-name>
          ,
          <article-title>Villasen~or-</article-title>
          <string-name>
            <surname>Pineda</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Overview of the MeO endEs task on o ensive text detection at IberLEF 2021</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <issue>0</issue>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pota</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ventura</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catelli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Esposito</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>An e ective bert-based pipeline for twitter sentiment analysis: A case study in italian</article-title>
          .
          <source>Sensors</source>
          <volume>21</volume>
          (
          <issue>1</issue>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blanco</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Incorporating emoji descriptions improves tweet classi cation</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <year>2096</year>
          {
          <fpage>2101</fpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Tanase</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaharia</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cercel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dascalu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Detecting aggressiveness in mexican spanish social media content by ne-tuning transformer-based models</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          )
          <article-title>co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2020</year>
          ), Malaga, Spain,
          <year>September 23th</year>
          ,
          <year>2020</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2664</volume>
          , pp.
          <volume>236</volume>
          {
          <fpage>245</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atanasova</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karadzhov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mubarak</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Derczynski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pitenis</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Coltekin, C.: SemEval-2020 task 12:
          <article-title>Multilingual o ensive language identi cation in social media (O ensEval 2020)</article-title>
          .
          <source>In: Proceedings of the Fourteenth Workshop on Semantic Evaluation</source>
          . pp.
          <volume>1425</volume>
          {
          <fpage>1447</fpage>
          . International Committee for Computational Linguistics,
          <source>Barcelona (online) (Dec</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>