<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transformer based O ensive Language Identi cation in Spanish?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sreelakshmi K</string-name>
          <email>sreelakshmi@cb.students.amrita.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Premjith B</string-name>
          <email>premjith@cb.amrita.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>K. P. Soman</string-name>
          <email>soman@amrita.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Computational Engineering and Networking (CEN), Amrita School of Engineering</institution>
          ,
          <addr-line>Coimbatore, Amrita Vishwa Vidyapeetham</addr-line>
          ,
          <country>India k</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the work done for the shared task on MeO endEs@IberLEF 2021 Non-contextual binary classi cation for Mexican Spanish. We implemented two deep neural network architectures such as a network containing a Bi-LSTM, LSTM, fully connected layer and another with a Bi-LSTM and LSTM stack. In addition to that we also implemented a BERT classi er. Among the three models the BERT exhibited better training performance, and we submitted the predictions based on the same. BERT performed well compared to other languages as it has pretrained embeddings that are trained on huge corpus of multiple languages.</p>
      </abstract>
      <kwd-group>
        <kwd>Long short-term memory</kwd>
        <kwd>Bidirectional Long Short-Term Memory</kwd>
        <kwd>Bidirectional Encoder Representations from Transformers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Social media platforms such as Facebook, Twitter and YouTube hearten
interpersonal communication by keeping these platforms open and free of cost. This
has made people to interact, post messages and comments and express their
views online. Unfortunately, it is often used as a means to attack or o end
people leading to unfolding of hateful and o ensive content resulting in cyber
violence. O ensive content is non-verbal or oral communication expressing
disparity against a group or person based on their religion, age, sexual orientation,
race, gender, nationality, and ethnicity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Putting an end to usage and spread of o ensive content is peremptory for all
the social media platforms and content moderation establishments. At present
most of the moderations are limited to the community platforms that reckon
?
on repeatedly used words and block-lists or human moderators. Furthermore,
these are not reliable options for all the platforms due to their sheer cost and
long-drawn-out process [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Considerable amount of work was done to
identify o ensive content in English,but little research has been done Non-English
languages like Spanish [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [16, ?] [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>MeO endEs@IberLEF 2021 focuses on o ensive language analysis in social
networks for Spanish. MeO endEs@IberLEF 2021 has four subtasks.
{ Subtask 1: Non-contextual multiclass classi cation for generic Spanish
{ Subtask 2: Contextual multiclass
{ Subtask 3: Non-contextual binary classi cation for Mexican Spanish
{ Subtask 4: Contextual binary classi cation for Mexican Spanish
We participated in the Subtask 3, were we employed three di erent models, one
using BERT and the other two deep learning models like a Hybrid model using
a Bi-LSTM, LSTM stack and a model with Bi-LSTM, LSTM stack followed by
a fully connected layer. The BERT based model gave the highest precision on
the test data.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Literature Review</title>
      <p>
        For the past few years, one of the major concern for social media platforms
and users is o ensive messages that tarnish an individual or a group. Various
abusive and o ensive language identi cation problems and shared tasks have
been explored in the literature ranging from aggression to cyberbullying, hate
speech, toxic comments, and o ensive language but there is very few work done
in Spanish language though it being the fourth most spoken language [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Due to the high amount of o ensive content that spread through social media,
oodles of academic events and shared tasks on o ensive and hate speech detection
have taken place. Few of them are the rst, second and third Workshop on
Abusive Language [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], SemEval 2020 [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] on o ensive language identi cation
from multilingual languages (O ensEval) like English, Arabic, Danish, Greek,
Turkish, FIRE 2019 on o ensive language identi cation from Indo-European
languages [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], various editions of GermEval Shared Task on the Identi cation of
O ensive Language [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. This shared task deals with the classi cation of German
tweets to binary classes (OFFENSE and OTHER), ne-grained multi-classes
(OFFENSE, OTHER, PROFANITY, INSULT) and classi cation of o ensive
tweets as explicit or implicit.
      </p>
      <p>
        SemEval 2019 [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] conducted a task on o ensive and non-o ensive
comments detection from English tweets. The dataset (OLID) consists of 13240
tweets for training and 860 tweets for testing. Deep learning models like
Convolutional Neural Networks, Bidirectional Encoder Representations from
Transformers (BERT), Long Short Term Memory (LSTM), LSTM with attention,
Embeddings from Language Models (ELMo), basic machine learning models
were a part of the assorted models used.
      </p>
      <p>
        IberLEF 2019, had a task on aggressiveness detection from Mexican
Spanish tweets. Teams used various features such as Bag of Words with TF-IDF
weights, hierarchical features obtained with CNN, Statistical descriptors,
Document frequency, mutual information, and lexical Availability, Linguistic features
and di erent types of n-grams and classi ers such as CNN, LSTM and
Multilayer Perceptron, GRU and machine learning models like SVM, Nave Bayes,
logistic regression [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Study of various Deep Learning methods with recently pre-trained language
models based on Transfer Learning and basic machine learning models have
been done in this line of research. BERT, XLM and BETO have given promising
results in Spanish hate speech detection [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Dataset Description</title>
      <sec id="sec-3-1">
        <title>Dataset Description</title>
        <p>
          The shared task [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] has 4 subtasks namely,
{ Non-contextual multiclass classi cation for generic Spanish
{ Contextual multiclass classi cation for generic Spanish
{ Non-contextual binary classi cation for Mexican Spanish
{ Contextual binary classi cation for Mexican Spanish
Among the four we participated in Non-contextual binary classi cation for
Mexican Spanish. The dataset statistics and annotation is as given in Table 1.
Dataset Name Labels Train set Valid set Test set
Non-contextual binary classi ca- 0 - Non-o ensive and 1 - o ensive 5060 76 2183
tion for Mexican Spanish
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>System Description and Results</title>
      <p>This section gives the details of the models used to experiment on the data. In
this work we have experimented using a BERT model and deep learning models
to identify the o ensive texts.
4.1</p>
      <sec id="sec-4-1">
        <title>Preprocessing</title>
        <p>The dataset comprises social media texts and hence includes user names,
hashtags, and URLs. Since these entities do not contribute much to the classi cation
task, we employed a preprocessing step to replace the user names with word
`USERNAME', hashtags with the word `HASH' and URLs with the word `URL'
and also removed the punctuations and symbol like ' % &amp; &lt; &gt; : = ) ( ' from
the text.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Models</title>
        <p>We experimented with deep learning models and a BERT model for classifying
the social media text into di erent categories. The BERT model gave the highest
result on the test set.</p>
        <p>Deep Learning We conducted the experiments to classify the social media text
into o ensive and non-o ensive using two deep learning models. The two models
are:
{ Model 1: Hybrid BiLSTM, LSTM stack followed by a fullyconnected layer
dedicated for classi cation
{ Model 2: Hybrid BiLSTM, LSTM, dense layer stack followed by a
fullyconnected layer dedicated for classi cation</p>
        <p>The preprocessed text undergoes few more steps before the classi cation.
{ Tokenization: An "&lt;OOV&gt;" token is used to mark the Out-of-Vocabulary
words
{ Padding: To maintain equal length sentences are padded with zeros.</p>
        <p>The hyperparameters used for both the models are given in Table 2</p>
        <p>
          The results obtained on the test data for both the models are as given in
Table 3
BERT We made use of 12-layer BERT model ("bert-base-multilingual-cased")
for classi cation [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. The "bert-base-multilingual-cased" pretrained model was
used and ne tuned over our data. The major steps involved in the experiment
are as follows:
{ Tokenise the sentences
{ Add special tokens [CLS] and [SEP]
{ Map the tokens to their IDs
{ Pad and truncate the sentences to the same length
{ Creating the attention masks
        </p>
        <sec id="sec-4-2-1">
          <title>The netuned hyperparameters are given in Table 4</title>
          <p>The BERT based model gave the highest precision among the three models
experimented and hence the prediction from the BERT model was submitted.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This paper presents the submission to the shared task MeO endEs@IberLEF
2021 on O ensive Language Identi cation from Mexican Spanish text. Two Deep
Learning models, such as a hybrid network with a LSTM layer, a Bi-LSTM layer,
a network consisting of a LSTM layer, a Bi-LSTM layer and a fully connected
network were implemented. A transformer based BERT model was also
implemented. The BERT gave the highest result of 91% precision for Non-contextual
binary classi cation from Mexican Spanish text.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Mario</given-names>
            <surname>Ezra</surname>
          </string-name>
          <string-name>
            <surname>Aragon</surname>
          </string-name>
          , Miguel Angel Alvarez Carmona, Manuel Montes-y Gomez,
          <article-title>Hugo Jair Escalante, Luis Villasenor Pineda, and Daniela Moctezuma. Overview of mex-a3t at iberlef 2019: Authorship and aggressiveness analysis in mexican spanish tweets</article-title>
          .
          <source>In IberLEF@ SEPLN</source>
          , pages
          <volume>478</volume>
          {
          <fpage>494</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Arsht</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Etcovitch</surname>
          </string-name>
          .
          <article-title>The human cost of online content moderation</article-title>
          .
          <source>Harvard Law Review Online</source>
          , Harvard University, Cambridge, MA, USA. Retrieved from https://jolt. law. harvard. edu/digest/the-human
          <article-title>-cost-ofonlinecontent-</article-title>
          <string-name>
            <surname>moderation</surname>
          </string-name>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Bharathi</given-names>
            <surname>Raja</surname>
          </string-name>
          <string-name>
            <surname>Chakravarthi</surname>
          </string-name>
          , Mihael Arcan,
          <string-name>
            <surname>and John Philip McCrae.</surname>
          </string-name>
          <article-title>Improving wordnets for under-resourced languages using machine translation</article-title>
          .
          <source>In Proceedings of the 9th Global WordNet Conference</source>
          , pages
          <volume>77</volume>
          {
          <fpage>86</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Shagun</given-names>
            <surname>Jhaver</surname>
          </string-name>
          , Iris Birman, Eric Gilbert, and
          <string-name>
            <given-names>Amy</given-names>
            <surname>Bruckman</surname>
          </string-name>
          .
          <article-title>Human-machine collaboration for content regulation: The case of reddit automoderator</article-title>
          .
          <source>ACM Transactions on Computer-Human Interaction (TOCHI)</source>
          ,
          <volume>26</volume>
          (
          <issue>5</issue>
          ):1{
          <fpage>35</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Shagun</given-names>
            <surname>Jhaver</surname>
          </string-name>
          , Sucheta Ghoshal, Amy Bruckman, and
          <string-name>
            <given-names>Eric</given-names>
            <surname>Gilbert</surname>
          </string-name>
          .
          <article-title>Online harassment and content moderation: The case of blocklists</article-title>
          .
          <source>ACM Transactions on Computer-Human Interaction (TOCHI)</source>
          ,
          <volume>25</volume>
          (
          <issue>2</issue>
          ):1{
          <fpage>33</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Evangelos</given-names>
            <surname>Kalampokis</surname>
          </string-name>
          , Efthimios Tambouris, and
          <string-name>
            <given-names>Konstantinos</given-names>
            <surname>Tarabanis</surname>
          </string-name>
          .
          <article-title>Understanding the predictive power of social media</article-title>
          .
          <source>Internet Research</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Mandl</surname>
          </string-name>
          , Sandip Modha, Prasenjit Majumder, Daksh Patel, Mohana Dave, Chintak Mandlia, and
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Patel</surname>
          </string-name>
          .
          <article-title>Overview of the hasoc track at re 2019: Hate speech and o ensive content identi cation in indo-european languages</article-title>
          .
          <source>In Proceedings of the 11th forum for information retrieval evaluation</source>
          , pages
          <volume>14</volume>
          {
          <fpage>17</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Manuel</given-names>
            <surname>Montes</surname>
          </string-name>
          , Paolo Rosso, Julio Gonzalo, Ezra Aragon, Rodrigo Agerri,
          <string-name>
            <surname>Miguel Angel</surname>
          </string-name>
          Alvarez-Carmona, Elena Alvarez Mellado, Jorge
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , Luis Chiruzzo, Larissa Freitas, Helena Gomez Adorno, Yoan Gutierrez,
          <article-title>Salud Mar a Jimenez-Zafra, Salvador Lima</article-title>
          ,
          <string-name>
            <surname>Flor Miriam</surname>
          </string-name>
          Plaza-de Arco, and Mariona Taule, editors.
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2021</year>
          ).
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Flor</given-names>
            <surname>Miriam</surname>
          </string-name>
          Plaza
          <article-title>-del-</article-title>
          <string-name>
            <surname>Arco</surname>
            , Marco Casavantes, Hugo Jair Escalante,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Teresa</surname>
          </string-name>
          Martin-Valdivia,
          <article-title>Arturo Montejo-Raez, Manuel Montes-y-</article-title>
          <string-name>
            <surname>Gomez</surname>
          </string-name>
          ,
          <article-title>Horacio Jarqu n-Vasquez, and Luis Villasen~or-Pineda. Overview of the MeO endEs task on o ensive text detection at IberLEF 2021</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          ,
          <volume>67</volume>
          (
          <issue>0</issue>
          ),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Flor Miriam Plaza-del Arco</surname>
            ,
            <given-names>M Dolores</given-names>
          </string-name>
          <string-name>
            <surname>Molina-Gonzalez</surname>
            ,
            <given-names>L Alfonso</given-names>
          </string-name>
          <string-name>
            <surname>Uren</surname>
          </string-name>
          <article-title>~a-</article-title>
          <string-name>
            <surname>Lopez</surname>
            , and
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Teresa Mart</surname>
          </string-name>
          n-Valdivia.
          <article-title>Comparing pre-trained language models for spanish hate speech detection</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>166</volume>
          :
          <fpage>114120</fpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Manikandan</given-names>
            <surname>Ravikiran</surname>
          </string-name>
          and
          <string-name>
            <given-names>Subbiah</given-names>
            <surname>Annamalai</surname>
          </string-name>
          . Dosa:
          <article-title>Dravidian code-mixed offensive span identi cation dataset</article-title>
          .
          <source>In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages</source>
          , pages
          <volume>10</volume>
          {
          <fpage>17</fpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sarah T Roberts</surname>
            , Joel Tetreault, Vinodkumar Prabhakaran, and
            <given-names>Zeerak</given-names>
          </string-name>
          <string-name>
            <surname>Waseem</surname>
          </string-name>
          .
          <source>Proceedings of the third workshop on abusive language online</source>
          .
          <source>In Proceedings of the Third Workshop on Abusive Language Online</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>T Tulasi</given-names>
            <surname>Sasidhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B</given-names>
            <surname>Premjith</surname>
          </string-name>
          , and
          <string-name>
            <given-names>KP</given-names>
            <surname>Soman</surname>
          </string-name>
          .
          <article-title>Emotion detection in hinglish (hindi+ english) code-mixed social media text</article-title>
          .
          <source>Procedia Computer Science</source>
          ,
          <volume>171</volume>
          :
          <fpage>1346</fpage>
          {
          <fpage>1352</fpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>K</given-names>
            <surname>Sreelakshmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B</given-names>
            <surname>Premjith</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Soman</given-names>
            <surname>Kp</surname>
          </string-name>
          .
          <article-title>Amrita cen nlp@ dravidianlangtecheacl2021: Deep learning-based o ensive language identi cation in malayalam, tamil and kannada</article-title>
          .
          <source>In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages</source>
          , pages
          <volume>249</volume>
          {
          <fpage>254</fpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>K</given-names>
            <surname>Sreelakshmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B</given-names>
            <surname>Premjith</surname>
          </string-name>
          , and
          <string-name>
            <given-names>KP</given-names>
            <surname>Soman</surname>
          </string-name>
          .
          <article-title>Detection of hate speech text in hindi-english code-mixed data</article-title>
          .
          <source>Procedia Computer Science</source>
          ,
          <volume>171</volume>
          :
          <fpage>737</fpage>
          {
          <fpage>744</fpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Soman K P Sreelakshmi</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Premjith</surname>
            <given-names>B</given-names>
          </string-name>
          . Amrita cen at hasoc 2019:
          <article-title>Hate speech detection in roman and devanagiri scripted text</article-title>
          .
          <source>In Proceedings of the 11th forum for information retrieval evaluation</source>
          , pages
          <volume>14</volume>
          {
          <fpage>17</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Julia Maria Stru , Melanie Siegel, Josef Ruppenhofer, Michael Wiegand,
          <string-name>
            <given-names>Manfred</given-names>
            <surname>Klenner</surname>
          </string-name>
          , et al.
          <article-title>Overview of germeval task 2, 2019 shared task on the identi cation of o ensive language</article-title>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Sreelakshmi K Soman K.P. Tulasi Sasidhar</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Premjith</surname>
            <given-names>B</given-names>
          </string-name>
          .
          <article-title>Sentiment analysis on hindi{english code-mixed social media text</article-title>
          . volume
          <volume>171</volume>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ashish</surname>
            <given-names>Vaswani</given-names>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <string-name>
            <surname>Lukasz Kaiser</surname>
            , and
            <given-names>Illia</given-names>
          </string-name>
          <string-name>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <article-title>Attention is all you need</article-title>
          .
          <source>arXiv preprint arXiv:1706.03762</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Marcos</surname>
            <given-names>Zampieri</given-names>
          </string-name>
          , Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and
          <string-name>
            <given-names>Ritesh</given-names>
            <surname>Kumar</surname>
          </string-name>
          . Semeval
          <article-title>-2019 task 6: Identifying and categorizing o ensive language in social media (o enseval)</article-title>
          .
          <source>arXiv preprint arXiv:1903.08983</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Marcos</surname>
            <given-names>Zampieri</given-names>
          </string-name>
          , Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, and Cagr Coltekin. Semeval-2020 task 12:
          <article-title>Multilingual o ensive language identi cation in social media (o enseval 2020)</article-title>
          . arXiv preprint arXiv:
          <year>2006</year>
          .07235,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>