<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LIDOMA at HOMO-MEX2023@IberLEF: Hate Speech Detection Towards the Mexican Spanish-Speaking LGBT+ Population. The Importance of Preprocessing Before Using BERT-Based Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Moein Shahiki-Tash</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesús Armenta-Segura</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zahra Ahani</string-name>
          <email>zahraahani97100@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Kolesnikova</string-name>
          <email>kolesnikova@cic.ipn.mx</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grigori Sidorov</string-name>
          <email>sidorov@cic.ipn.mx</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Gelbukh</string-name>
          <email>gelbukh@cic.ipn.mx</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC)</institution>
          ,
          <addr-line>Mexico city</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Instituto Politécnico Nacional, Centro de Investigación en Computación (CIC IPN), Juan de Dios Bátiz Av., Gustavo A. Madero</institution>
          ,
          <addr-line>07738 Ciudad de México</addr-line>
          ,
          <country country="MX">México</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Hate speech targeting LGBT+ individuals poses a deeply ingrained problem with wide-ranging consequences, encompassing substance abuse disorders and discrimination. These specific concerns are particularly amplified in Mexico. In this paper, we present our submission on the first track of the HOMOMEX: Hate Speech Detection towards the Mexican Spanish-Speaking LGBT+ Population. We explore the dataset and we employ transformer architectures, who have demonstrated significant eficacy in similar sentiment analysis tasks. Specifically, we utilize BERT-based models and we show the importance of preprocessing by reaching the last place in the competition with a Macro F1 score of 0.73. The source code to reproduce our results can be found at https://github.com/moeintash72</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;BERT-based models</kwd>
        <kwd>Hate Speech Detection</kwd>
        <kwd>LGBT+phobia</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Preprocessing</kwd>
        <kwd>CEUR-WS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        LGBT+phobia, defined as any type of discrimination based on sexual preferences and/or
gender identities, remains a significant and far-reaching issue with profound implications.
Members of the LGBT+ community are particularly vulnerable to substance abuse disorders,
disproportionate mental health challenges, discrimination in labor markets, and limited access
to education and healthcare services. These challenges are further amplified in Mexico, where
substance abuse disorders are highly prevalent even beyond the LGBT+ community [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        The HOMO-MEX: Hate Speech Detection towards the Mexican Spanish-Speaking LGBT+
Population [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] includes a tracks aimed at addressing the issue, by detecting whether a Mexican
tweet contains LGBT+phobia or not, diferentiating between the tweets who address the topics
from the ones who does not. Hate speech detection is a challenging task due to the complex
interplay of linguistic factors and nuanced emotions involved in it [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. To address this challenge,
the use of transformers has been instrumental, with promising results in homophobia detection
and other related text processing tasks [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8">4, 5, 6, 7, 8</xref>
        ].
      </p>
      <p>
        Transformers [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] have revolutionized natural language processing tasks with the introduction
of the self-attention mechanism. This mechanism empowers the model to capture global
dependencies and contextual relationships within a sequence, rendering transformers highly
adept at handling tasks such as sentiment analysis and, more notably, hate speech, as
previously mentioned. While this may seem promising on paper, it is crucial to preprocess
the text to optimize the eficacy of attention mechanisms. By removing lexical noise,
such as stopwords, and eliminating linguistic noise through lemmatization, self-attention
mechanisms can focus exclusively on pragmatic and metalinguistic features, which are
directly associated with the text representation of sentiments. As a dramatic proof of the
importance of preprocessing, our submission omitted it and got the last place on the competition.
      </p>
      <p>In this paper, we describe our system to address this shared task. We obtained a Macro F1
score of 0.73. The structure of this paper is as follows: in Section 2, we describe some state of
the art works on hate speech detection and LGBT+phobia detection. In Section 3, we detail our
methodology. In Section 4, we provide a description for the dataset. Additionally, we outline
our experimental workflow. In Section 5, we discuss the results of our experiments. Finally, in
Section 6, we conclude the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Hate speech has been a prominent area in the field of Natural Language Processing for quite
some time [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ], with significant attention being drawn to it at least since 1997 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], where a
system called Smokey was developed to detect abusive messages on internet.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the authors proposed that hate speech could be understood in terms of its lexicon and
how that is used, resembling the task of word sense disambiguation. However, their findings
indicated that this hypothesis might not hold when dealing with incomplete datasets. For
instance, the word jew was exclusively present in anti-semitic speech of their dataset, causing
their methods to associate jew as a solely anti-Jewish feature, regardless of its intended sense.
In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the authors made an extensive classification of hate speech targets, which allowed
them to focus in sentence structure more than in lexical features. As a consequence, they were
able to detect racial speech regardless racial slur such as nigga, which is also used among black
people in a non-racist way. However, they also accepted that their approach was not extensive
enough to tackle the task.
      </p>
      <p>Particularly, hate speech against LGBT+ population has also received a singular focus, as in
[15], where a structuralist approach were employed to manually characterize linguistic features
in african homophobic speech, or in [16], where those features were addressed from a most
automatical and computational point of view.</p>
      <p>Another important task related to hate speech detection is the identification of supportive
speech, which was addressed in the same edition of IberLEF through a shared task called Hope
Speech [17]. In that shared task, we made two submissions [18, 19] which achieved third and
fourth place, respectively.</p>
      <p>Focusing on the computational point of view, several tools and techniques from Natural
Language Processing have been used to address the task, such as TF/IDF with bigrams
[20, 21, 22] or the zero-shot learning paradigm [23], who achieved several first places on the
LT-EDI-ACL2022 homophobia/transphobia speech detection contest [24], until the current
state-of-the-art transformers-based approaches [16, 25, 26, 27]. As long as we understood, all
these works preprocessed the dataset before training the model.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Our main goal was to develop a method capable to focus in the relevant features related with
LGBT+phobia, based on transformers. Keeping that on mind, we first converted all labels into
numeric values, as depicted in table 1.</p>
      <p>After that, we converted the dataset into a Hugging Face arrow dataset and we tokenized all
tweets using the bert-base-cased model [28], which allow us to retrieve special tokens and their
corresponding IDs. The next step was to start the classification task, for which we loaded the
Auto-Model-For-Sequence-Classification class from the Hugging Face library, in order to use a
BERT model. Finally, we used the best trained model to make predictions on the test dataset.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <p>4.1. Data
The task involves analyzing a large dataset of 7, 000 Mexican tweets gathered from 2012
to 2021. As showed in Table 1, they are labelled as P for LGBT+phobic tweets, NP for
non-LGBT+phobic tweets who address the issue, and NA for non-related tweets. See Table 2 for
the precise number of tweets for each class.</p>
      <p>In Table 3 there are six examples from the dataset. Some of the tweets labelled as
LGBT+phobics consist of authors not showing such behavior, but describing someone who
does. For instance, in Example 0, the author describes what happened when he tried to hit
on a lesbophobic girl while, in Example 2, the author is a journalist exposing a lesbophobic
tycoon. Moreover, some non-LGBT+phobic tweets were wrong labelled, as Example 1140, who
is directly homophobic given the Mexican idiosyncrasy.
4.2. Experimental workflow
Our experimental setup is as follow:
• We prepared the data by converting the labels into numerical values as depicted in Table
1
• Then, we tokenized with the Hugging Face AutoTokenizer, truncated to a maximum
length of 32 tokens, and padded to ensure consistent input size.
• For the classification task, as hinted in Section 3, we loaded the pre-trained BERT model for
sequence classification (AutoModelForSequenceClassification) with the specified number
of labels (three, in this case).</p>
      <p>• Finally, we trained the model with the Trainer class from the Hugging Face library.</p>
      <p>Training arguments were configured, including the output directory, logging settings, batch
size, learning rate, and number of epochs. This class also included backpropagation and
optimization.</p>
      <p>Once with a model trained in the test dataset, we use it for make predictions on the test
dataset. The predicted labels are obtained from the predicted probabilities using argmax. We
saved the results in the text file result.txt with tab-separated values (TSV) format.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>After a four-epoch training, we achieved a Macro F1 score of 0.73. For the results, two main
factors were crucial: in the first place, the lack of a preprocessing before the tokenization with
the Huggingface AutoTokenizer, and in second place the quality of the dataset labeling. As
showed in Section 4, Table 3, several LGBT+phobic instances described someone with that
behavior, but distinct of the author. Carving on the dataset, we also find non-LGBT+phobic
instances with the same quirk (see two examples in Table 4), who might led to ambiguity to the
trained model.</p>
      <p>Tweet in spanish English Translation
Wow, luego que las TERFas se Wow, after TERF feminists
aventajaran en el Reino Unido gained power in the UK,
ahora ya lograron que el they obligue the medical
sistema médico destransicione system to untransition
a la gente joven trans young trans people.</p>
      <p>Para que luego a una como mujer To provoke them to not allow
trans no la dejen entrar a estas you to enter in this stuf,
cosas o se la hagan de jamón. due you are a trans woman.</p>
      <p>¡Gracias, nerdos tóxicos! Thank you!, toxic nerds!</p>
      <p>Another opportunity of enhancemnt can be found in the non-LGBT+phobic labeling. For
instante, in Examples 1140 and 1165, we find adjectives for LGBT+ people used as an insult,
but the tweet was labelled as non-LGBT+phobic. Example 508 is a LGBT+phobic tweet, labelled
as non-related. From the three labels, we empirically suspect that the best was the non-related,
since we struggle to find a wrong example. However, in the other two examples, a quick search
of key words was enough to find several opportunities of enhancement, which lead us to suspect
that the mentioned quirks are statistical significative enough to bias the state-of-the-art BERT
models.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In this paper, we presented our approach for addressing the first track of the HOMO-MEX 2023
shared task of hate speech detection towards the Mexican Spanish-speaking LGBT+ population.
We leveraged the power of transformers, specifically BERT models, which have proven efective
in hate speech tasks.</p>
      <p>Our methodology involved the conversion of labels into numerical values, and tokenizing
the tweets using the bert-base-cased model, without a text-preprocessing. We employed the
AutoModelForSequenceClassification class from the Hugging Face library to train the BERT
model and make predictions on the test dataset.</p>
      <p>Although the lack of preprocessing influenced a lot the poor results, we also find several
opportunities for improvement in the dataset, with instances of mislabeling and ambiguity.
These discrepancies might introduce biases and afect the eficacy of state-of-the-art BERT
models.</p>
      <p>The findings highlight the complexities involved in detecting LGBT+phobic speech and call
for continued research and development in addressing this critical issue. Future work should
focus on enhancing the classifications and embeddings, maybe by considering tools with less use
in this tasks like the zero-shot learning paradigm. Also, we want to explore novel approaches
to improve the detection and mitigation of LGBT+phobia in online spaces.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>The work was done with partial support from the Mexican Government through the grant
A1-S-47854 of CONACYT, Mexico, grants 20232138, 20232080, 20231567 of the Secretaría de
Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors thank the
CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje
Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE,
Mexico and acknowledge the support of Microsoft through the Microsoft Latin America PhD
Award.
arXiv:1603.07709.
[15] V. Reddy, Perverts and sodomites: homophobia as hate speech in africa, Southern African</p>
      <p>Linguistics and Applied Language Studies 20 (2002) 163 – 175.
[16] I. S. Upadhyay, K. A. Srivatsa, R. Mamidi, Sammaan@LT-EDI-ACL2022: Ensembled
transformers against homophobia and transphobia, in: Proceedings of the Second Workshop on
Language Technology for Equality, Diversity and Inclusion, Association for Computational
Linguistics, Dublin, Ireland, 2022, pp. 270–275. URL: https://aclanthology.org/2022.ltedi-1.
39. doi:10.18653/v1/2022.ltedi-1.39.
[17] Jiménez-Zafra, Salud María and García-Cumbreras, Miguel Ángel and García-Baena, Daniel
and García-Díaz, José Antonio and Raja Chakravarthi, Bharathi and Valencia-García, Rafael
and Ureña-López, L. Alfonso, Overview of HOPE2023@IberLEF: Multilingual Hope Speech
Detection, Procesamiento del Lenguaje Natural 71 (2023).
[18] Z. Ahani, G. Sidorov, O. Kolesnikova, A. Gelbukh, Hope speech detection from text using
tf-idf features and machine learning algorithms, in: Proceedings of the Iberian Languages
Evaluation Forum (IberLEF), Jaén, Spain, 2023.
[19] M. Shahiki-Tash, J. Armenta-Segura, O. Kolesnikova, G. Sidorov, A. Gelbukh, LIDOMA
at HOPE2023@IberLEF: Hope speech detection using lexical features and convolutional
neural networks, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF),
Jaén, Spain, 2023.
[20] N. Ashraf, M. Taha, A. Abd Elfattah, H. Nayel, NAYEL @LT-EDI-ACL2022:
Homophobia/transphobia detection for equality, diversity, and inclusion using SVM, in: Proceedings
of the Second Workshop on Language Technology for Equality, Diversity and
Inclusion, Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 287–290. URL:
https://aclanthology.org/2022.ltedi-1.42. doi:10.18653/v1/2022.ltedi-1.42.
[21] M. Shahiki Tash, Z. Ahani, A. Tonja, M. Gemeda, N. Hussain, O. Kolesnikova, Word level
language identification in code-mixed Kannada-English texts using traditional machine
learning algorithms, in: Proceedings of the 19th International Conference on Natural
Language Processing (ICON): Shared Task on Word Level Language Identification in
Codemixed Kannada-English Texts, Association for Computational Linguistics, IIIT Delhi, New
Delhi, India, 2022, pp. 25–28. URL: https://aclanthology.org/2022.icon-wlli.5.
[22] F. Balouchzahi, H. L. Shashirekha, G. Sidorov, Hssd: Hate speech spreader detection using
n-grams and voting classifier, in: CEUR Workshop Proceedings, volume 2936, 2021, pp.
1829–1836. URL: www.scopus.com, cited By :6.
[23] M. Singh, P. Motlicek, IDIAP submission@LT-EDI-ACL2022: Homophobia/transphobia
detection in social media comments, in: Proceedings of the Second Workshop on Language
Technology for Equality, Diversity and Inclusion, Association for Computational
Linguistics, Dublin, Ireland, 2022, pp. 356–361. URL: https://aclanthology.org/2022.ltedi-1.55.
doi:10.18653/v1/2022.ltedi-1.55.
[24] B. R. Chakravarthi, R. Priyadharshini, T. Durairaj, J. McCrae, P. Buitelaar, P. Kumaresan,
R. Ponnusamy, Overview of the shared task on homophobia and transphobia detection in
social media comments, in: Proceedings of the Second Workshop on Language Technology
for Equality, Diversity and Inclusion, Association for Computational Linguistics, Dublin,
Ireland, 2022, pp. 369–377. URL: https://aclanthology.org/2022.ltedi-1.57. doi:10.18653/
v1/2022.ltedi-1.57.
[25] D. Nozza, Nozza@LT-EDI-ACL2022: Ensemble modeling for homophobia and transphobia
detection, in: Proceedings of the Second Workshop on Language Technology for
Equality, Diversity and Inclusion, Association for Computational Linguistics, Dublin, Ireland,
2022, pp. 258–264. URL: https://aclanthology.org/2022.ltedi-1.37. doi:10.18653/v1/2022.
ltedi-1.37.
[26] V. Bhandari, P. Goyal, bitsa_nlp@LT-EDI-ACL2022: Leveraging pretrained language
models for detecting homophobia and transphobia in social media comments, in:
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and
Inclusion, Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 149–154.</p>
      <p>URL: https://aclanthology.org/2022.ltedi-1.18. doi:10.18653/v1/2022.ltedi-1.18.
[27] A. Maimaitituoheti, ABLIMET @LT-EDI-ACL2022: A roberta based approach for
homophobia/transphobia detection in social media, in: Proceedings of the Second Workshop on
Language Technology for Equality, Diversity and Inclusion, Association for Computational
Linguistics, Dublin, Ireland, 2022, pp. 155–160. URL: https://aclanthology.org/2022.ltedi-1.19.
doi:10.18653/v1/2022.ltedi-1.19.
[28] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, ArXiv abs/1810.04805 (2019).
[29] J. Armenta-Segura, G. Sidorov, A baseline for anime success prediction, based on synopsis,
in: Congreso Mexicano de Inteligencia Artificial de la Sociedad Mexicana de Inteligencia
Artificial COMIA-MICAI 2023, Zapopan, Jalisco, 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Secretaría de Salud</surname>
          </string-name>
          , Encuesta Nacional de Consumo de Drogas 2016
          <article-title>-2017</article-title>
          , Survey,
          <year>2017</year>
          . URL: https://www.gob.mx/cms/uploads/attachment/file/234856/CONSUMO_DE_ DROGAS.pdf, mexico.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gómez-Adorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sierra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vásquez</surname>
          </string-name>
          , S.-T. Andersen,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ojeda-Trueba</surname>
          </string-name>
          ,
          <article-title>Overview of HOMO-MEX at Iberlef 2023: HOMO-MEX: Hate speech detection in Online Messages directed tOwards the MEXican spanish speaking LGBTQ+ population</article-title>
          ,
          <source>Procesamiento del lenguaje natural 71</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <article-title>Las for hasoc-learning approaches for hate speech and ofensive content identification</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ghanghor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>IIITK@LT-EDI-EACL2021: Hope speech detection for equality, diversity, and inclusion in Tamil , Malayalam and English</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion</source>
          , Association for Computational Linguistics, Kyiv,
          <year>2021</year>
          , pp.
          <fpage>197</fpage>
          -
          <lpage>203</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .ltedi-
          <volume>1</volume>
          .
          <fpage>30</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z. M.</given-names>
            <surname>Farooqi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <article-title>Leveraging transformers for hate speech detection in conversational code-mixed tweets</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2112</volume>
          .
          <fpage>09986</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Nwogu</surname>
          </string-name>
          ,
          <article-title>Wlv-rit at hasoc-dravidian-codemixifre2020: Ofensive language identification in code-switched youtube comments</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2011</year>
          .00559.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gemeda Yigezu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Lambebo</given-names>
            <surname>Tonja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kolesnikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Shahiki</given-names>
            <surname>Tash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          ,
          <article-title>Word level language identification in code-mixed Kannada-English texts using deep learning approach</article-title>
          ,
          <source>in: Proceedings of the 19th International Conference on Natural Language Processing</source>
          (ICON):
          <article-title>Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts, Association for Computational Linguistics</article-title>
          , IIIT Delhi, New Delhi, India,
          <year>2022</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>33</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .icon-wlli.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Tonja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Yigezu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kolesnikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Tash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbuk</surname>
          </string-name>
          ,
          <article-title>Transformerbased model for word level language identification in code-mixed kannada-english texts</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2211</volume>
          .
          <fpage>14459</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: NIPS</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <article-title>A survey on hate speech detection using natural language processing</article-title>
          ,
          <source>in: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media</source>
          , Association for Computational Linguistics, Valencia, Spain,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://aclanthology.org/W17-1101. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W17</fpage>
          -1101.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <article-title>A survey on automatic detection of hate speech in text, ACM Computing Surveys (CSUR) 51 (</article-title>
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Spertus</surname>
          </string-name>
          ,
          <article-title>Smokey: Automatic recognition of hostile messages</article-title>
          ,
          <source>in: AAAI/IAAI</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Warner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hirschberg</surname>
          </string-name>
          ,
          <article-title>Detecting hate speech on the world wide web</article-title>
          ,
          <source>in: Proceedings of the Second Workshop on Language in Social Media</source>
          , Association for Computational Linguistics, Montréal, Canada,
          <year>2012</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>26</lpage>
          . URL: https://aclanthology.org/W12-2103.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mondal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Correa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Benevenuto</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Weber</surname>
          </string-name>
          ,
          <article-title>Analyzing the targets of hate in online social media</article-title>
          ,
          <source>CoRR abs/1603</source>
          .07709 (
          <year>2016</year>
          ). URL: http://arxiv.org/abs/1603.07709.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>