<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Models for Ofensive Language Identification in Marathi</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mayuresh Nene</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kai North</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tharindu Ranasinghe</string-name>
          <email>T.D.RanasingheHettiarachchige@wlv.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcos Zampieri</string-name>
          <email>marcos.zampieri@rit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Rochester Institute of Technology</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Wolverhampton</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the WLV-RIT entry to the Hate Speech and Ofensive Content Identification in English and Indo-Aryan Languages (HASOC) shared task of 2021. The HASOC 2021 organizers provided participants with annotated datasets containing social media posts of English, Hindi and Marathi. We participated in Marathi Subtask 1A: identifying hateful, ofensive and profane content. In our methodology, we take advantage of available data from high resource languages by applying cross-lingual transformer-based models and transfer learning to make predictions to Marathi data. Our system achieved a macro F1 score of 0.91 for the test set and it ranked 1 place out of 25 systems.</p>
      </abstract>
      <kwd-group>
        <kwd>ofensive language identification</kwd>
        <kwd>transformers</kwd>
        <kwd>Marathi</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        All across the world, millions of smart devices connect to social media platforms on a daily
basis, such as Facebook, Twitter, or Instagram. Terabytes of comments, tweets, or posts are
uploaded ranging from the user’s breakfast to their opinions on global politics. In recent years,
there has been a rise in ofensive and hateful content [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. This content is problematic as it
may harm the user’s mental health [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and even incite self-harm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or violence towards others
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It has also be linked to the discrimination or marginalization of particular demographics
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and to the spread of misinformation [7], causing civil unrest or disobedience [8].
      </p>
      <p>
        Private entities as well as government bodies are interested in the development of machine
learning (ML) models that can automatically identify ofensive content on social media [
        <xref ref-type="bibr" rid="ref5">5, 9</xref>
        ].
Studies have introduced a variety of hate speech identification systems. These systems have
utilized traditional models, such as random forests (RFs) [10, 11], support vector machines (SVMs)
[10, 12, 13, 14], and Naive Bayes (NB) [15, 16, 17], to more recent, deep learning [18, 19, 20, 21]
and transformer-based models [20, 22, 23, 24]. However, none of these systems are perfect.
Issues in dataset quality [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as well as subjectivity in what is deemed as “ofensive content” are
still consider major challenges in this task [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. In addition, the vast majority of hate speech
identification systems deal only with English, with exceptions being made for Arabic [ 25], Greek
[26], Spanish [
        <xref ref-type="bibr" rid="ref7">27</xref>
        ], Hindi [
        <xref ref-type="bibr" rid="ref8">28</xref>
        ], and German [
        <xref ref-type="bibr" rid="ref10 ref9">29, 30</xref>
        ]. Little research has been conducted on
under-resourced languages, such as Marathi [
        <xref ref-type="bibr" rid="ref11">31</xref>
        ], presenting a gap in the current literature.
      </p>
      <p>
        In this paper, we present (in detail in Section 4) the WLV-RIT entry to the Hate Speech and
Ofensive Content Identification in English and Indo-Aryan Languages (HASOC) shared-task
of 2021. With the aim of providing research on the under-resourced Marathi language, we
participated on the Marathi track for sub-task 1A (Section 3.1). We adopted a transfer-learning
based approach. We achieved this by experimenting with transformer-based models, such as
BERT [
        <xref ref-type="bibr" rid="ref12">32</xref>
        ] and XLM-R [
        <xref ref-type="bibr" rid="ref13">33</xref>
        ]. These models automatically applied features salient with ofensive
content in Hindi, to the provided Marathi dataset (Section 4.2). Our system ranked 1st among
25 other systems, having attained a macro F1-score of 0.91.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Ofensive language identification has been a recurrent theme in recent shared tasks. There have
been many shared tasks organized in recent years such as OfensEval [
        <xref ref-type="bibr" rid="ref14 ref15">34, 35</xref>
        ], HASOC [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ],
TRAC [
        <xref ref-type="bibr" rid="ref16 ref8">28, 36</xref>
        ], HatEval [
        <xref ref-type="bibr" rid="ref17">37</xref>
        ], GermEval [
        <xref ref-type="bibr" rid="ref10 ref9">29, 30</xref>
        ], and IberEval [
        <xref ref-type="bibr" rid="ref7">27</xref>
        ]. Furthermore, there have
been diferent types of ofensive content addressed in these shared tasks including hate speech
[
        <xref ref-type="bibr" rid="ref18">38</xref>
        ], aggression [
        <xref ref-type="bibr" rid="ref16 ref8">28, 36</xref>
        ], and cyberbullying [
        <xref ref-type="bibr" rid="ref19">39</xref>
        ]. In the following section we explain previous
editions of HASOC.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Previous Shared-Tasks in HASOC</title>
        <p>
          Prior to HASOC 2021, two shared-tasks were also organized by the Forum for Information
Retrieval Evaluation (FIRE): (1). HASOC 2019 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], and (2). HASOC 2020 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. These shared-tasks
challenged participating teams to automatically identify ofensive content crawled from Twitter
and Facebook. These shared-tasks focused on ofensive content in a variety of Indo-European
languages, namely English, Hindi, and German. However, HASOC 2020 and HASOC 2021 have
since expanded this focus to include several languages with less available data, such as Marathi.
HASOC 2019. HASOC 2019 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] extracted approximately 7,005 English, 5,983 Hindi, and 4669
German posts from Facebook’s and Twitter’s APIs. These posts were acquired by searching
for hashtags and keywords that were considered ofensive by the dataset’s authors [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The
shared-task was split into three sub-tasks: (1). binary classification into hateful (HOF) or
nonhateful (NOT) tweets, (2). fine-grained classification into hate speech (HATE), ofensive (OFFN),
or profanity (PFRN), and (3). recipient identification.
        </p>
        <p>
          HASOC 2019 provided evidence in favor of deep learning models for hate speech
identification. Deep learning models such as a long-short term memory (LSTM) model (YNU) [18], a
convolutional neural network model (QutNocturnal) [19], and a BERT model (BRUMS) [23] did
exceptionally well, having achieved marco F1 scores for sub-task 1 of 0.7891 for English, 0.8025
for Hindi, and 0.5881 for German respectively [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. However, questions were raised in regards to
the dataset’s quality and what afect this may have had on systems’ overall performance. It was
pointed out that the dataset’s reliance on hashtags and keywords provided by the authors, had
likely made the dataset prone to the authors’ own bias in what they believed to be ofensive
or non-ofensive content. It was argued that this likely limited the dataset’s scope, by not
including ofensive or controversial topics which were unfamiliar to the authors. In turn, this
may have hindered the participating systems’ ability to recognize specific forms of hate speech,
or ofensive content in general.
        </p>
        <p>
          HASOC 2020. HASOC 2020 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] attempted to address the validity concerns of HASOC 2019.
Instead of using a “hand crafted list of hate speech related terms” [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] to extract ofensive posts,
the organizer’s of HASOC 2020 adopted a randomized sampling technique designed to reduce
the impact of the author’s bias on dataset quality. An archive containing Tweets in English,
Hindi, and German from May 2019 was download from archive.org [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and used to train a weak
binary SVM classifier. 2,600 tweets were identified by the classifier as being hateful. These
tweets were copied into HASOC’s 2020 new dataset as being examples of hateful tweets. 5% of
the remaining 35,000 identified non-hateful tweets were randomly selected and then also added
to this dataset. All of the selected tweets were then manually annotated by English, Hindi, and
German speaking annotators to produce the dataset’s final labels.
        </p>
        <p>
          HASOC 2020 maintained sub-task 1 and 2 from HASOC 2019 and applied these sub-tasks to its
new dataset. Again, deep learning models were found to achieve the best results for sub-task 1.
An LSTM model with GloVe word embeddings (IIIT_DWD) [
          <xref ref-type="bibr" rid="ref20">40</xref>
          ], a BiLSTM model with fastText
word embedings (NSIT) [
          <xref ref-type="bibr" rid="ref21">41</xref>
          ], and an ensemble of BERT, DistilBERT, and RoBERTa models
(ComMA) [22], attained F1 macro-average scores of 0.86 for English, 0.5337 for Hindi, and 0.5235
for German respectively [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Those systems that applied transfer learning also performed well,
with cross-lingual models, such as XLM-R, increasing these systems’ macro-average scores in
some instances [
          <xref ref-type="bibr" rid="ref22">22, 42</xref>
          ]. Howbeit, overall performances for this shared-task were considered
to be lower than those achieved in HASOC 2019. This was believed to be due to the diferent
sampling technique used, despite it being more realistic.
3. HASOC 2021
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>3.1. Task Description</title>
        <p>
          HASOC 2021 [
          <xref ref-type="bibr" rid="ref23 ref24">43, 44</xref>
          ] tasked participating teams to the same sub-tasks of HASOC 2019 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and
HASOC 2020 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Sub-tasks 1A and 1B being available in English and Hindi, whereas only
sub-task 1A being available in English, Hindi, as well as Marathi [
          <xref ref-type="bibr" rid="ref24">44</xref>
          ]. Being interested in
Marathi, we took part in sub-task 1A.
        </p>
        <p>• Sub-task 1A: A binary classification task, whereby participating systems were required
to classify tweets into two classes: hateful and ofensive (HOF), or non-hateful and
non-ofensive (NOT);
• Sub-task 1B: A more fine-grained classification task, whereby the previously identified
HOF posts were further classified into hate speech (HATE), ofensive (OFFN), and profanity
(PRFN).</p>
        <p>An additional sub-task was also made available. Sub-task 2 focused on the use of code-mixed
tweets, such as those in Hinglish. Hinglish being a mix of Hindi and English displaying lexemes,
morphology, and syntax taken from both languages. This sub-task also took into consideration
the target tweet, referred to as the parent tweet, as well as that tweet’s comments, and replies.
• Sub-task 2: A binary classification task, whereby participating systems were required
to classify code-mixed parent tweets into two classes: hateful and ofensive (HOF), or
non-hateful and non-ofensive (NOT).</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.2. Dataset</title>
        <p>
          The Marathi Ofensive Language Dataset (MOLD) [
          <xref ref-type="bibr" rid="ref11">31</xref>
          ] is the first dataset of its kind compiled
for Marathi. It contains 2,499 tweets extracted from Twitter’s API by searching for 22 common
Marathi curse words [
          <xref ref-type="bibr" rid="ref11 ref2">2, 31</xref>
          ]. Non-ofensive tweets were obtained by searching for a set of
Marathi phrases related “to politics, entertainment, and sports along with the hashtag #Marathi”
[
          <xref ref-type="bibr" rid="ref11">31</xref>
          ]. 6 Marathi speaking annotators then labeled these tweets with the ofensive (OFF) or
non-ofensive (NOT) labels. For HASOC 2021’s sub-task 1A, MOLD had a 80%/20% training and
test set split.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Methods</title>
      <p>The methodology applied in this work is divided in two parts. Section 4.1 describes traditional
machine learning models that we applied to sub-task 1A, and in Section 4.2 we describe our
transformer-based models.</p>
      <sec id="sec-3-1">
        <title>4.1. Traditional Machine Learning Methods</title>
        <p>
          In the first part of the methodology, we used traditional machine learning models. We
experimented with three models; Multi-layer Perceptron (MLP) [
          <xref ref-type="bibr" rid="ref25">45</xref>
          ], Support Vector Classification
(SVC)[
          <xref ref-type="bibr" rid="ref26">46</xref>
          ], and RF [
          <xref ref-type="bibr" rid="ref27">47</xref>
          ]. The models take an input vector and output a label, either HOF or NOT.
The models for MLP, SVC and RF were implemented using scikit-learn [
          <xref ref-type="bibr" rid="ref28">48</xref>
          ].
Data Preprocessing. Data preprocessing on the traditional ‘Devnagri’ scripts in which
Marathi and many other Indo-Aryan languages are written, included various steps, some of
which entailed the removal of stopwords, punctuation marks, URLs, tab spaces. Once we had
the most useful data, we went forward with tokenization using the IndicNLP1 library. We then
used TF-IDF to get vectors for the tokens that we had generated. These were then inputted into
several traditional machine learning models.
        </p>
        <p>
          Hyper Parameter Optimization. The SVC model was run with a Grid Search parameter
list. This was run on kernel value set to ‘rbf’, gamma value being selected from ‘1e-3’ and ‘1e-4’
and C value being selected from [
          <xref ref-type="bibr" rid="ref1">1, 10, 100, 1000</xref>
          ]. However, the best estimator did not give
more than a 1% improvement in the target precision score. The Random Forest Classifier was
also run with a grid search parameter list. This was run on ‘n_estimators’ set between values
1The IndicNLP framework is available on https://indicnlp.ai4bharat.org/home/
100, 200, 300, 500 and the ‘criterion’ being selected between ‘gini’ and ‘entropy’. However,
similar to the case with SVC, there was no major improvement in the scores for the target class.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Transformers</title>
        <p>
          As the second part of the methodology, we used transformer-based models. Transformer
architectures have been popular in text classification tasks such as ofensive language identification
[20, 22, 23]. Their success in these tasks, as seen in HASOC 2020 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], motivated us to use
transformer models for ofensive language identification in Marathi.
        </p>
        <p>
          Pre-trained Transformer models. We experimented with several SOTA transformer models
that support Marathi: multilingual BERT (BERT-m) [
          <xref ref-type="bibr" rid="ref12">32</xref>
          ] and XLM-Roberta (XLM-R) [
          <xref ref-type="bibr" rid="ref13">33</xref>
          ].
XLMR has an additional advantage: the embeddings are cross-lingual. This helps facilitate transfer
learning across languages, as presented later in this section. We followed the same architecture
described in Ranasinghe and Zampieri [
          <xref ref-type="bibr" rid="ref29">49</xref>
          ] where a simple softmax layer is added to the top
of the classification ( [CLS]) token to predict the probability of a class label. For XLM-R, from
the available two pre-trained models, we specifically used the XLM-R base model. Since the
transformer models are prone to random seed [
          <xref ref-type="bibr" rid="ref30">50</xref>
          ], each experiment was conducted with three
random seeds and considered the majority vote ensemble to get the final result [
          <xref ref-type="bibr" rid="ref31">51</xref>
          ].
Transfer Learning. The main appeal of transfer learning is its potential to leverage models
trained on data from outside the domain of interest. This can be particularly helpful for boosting
the performance of learning on low-resource languages such as Marathi. In this experiments,
we used Hindi data released for HASOC 2019 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Hindi is closely related to Marathi, and has
high language resources compared to Marathi. Therefore, performing transfer learning form
Hindi to Marathi can improve the results of Marathi [
          <xref ref-type="bibr" rid="ref32 ref33">52, 53</xref>
          ].
        </p>
        <p>We first trained the transformer model separately on HASOC 2019 dataset. Then we saved
the weights of the transformer model and the softmax layer and used these weights to initialize
the weights of the transformer-based classification model for Marathi.</p>
        <p>
          Implementation. We used a Nvidia Tesla K80 GPU to train the models. We mainly fine
tuned the learning rate and number of epochs of the classification model manually to obtain
the best results for the validation set. We obtained 1 −5 as the best value for learning rate and 3
as the best value for number of epochs for all the languages. Training for Hindi language took
around 40 minutes while training for Marathi took around 20 minutes. Our implementation is
based on HuggingFace [
          <xref ref-type="bibr" rid="ref34">54</xref>
          ]2.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Results and Evaluation</title>
      <p>In this section, we report the experiments we conducted and their results. As informed by the
task organizers, we used macro F1 score to measure the model performance. We also report
2The implementation is available on https://github.com/tharindudr/DeepOffense
precision, recall and F1 score for each class label as well the macro F1 score in the results tables.
The reported results are for the test set.</p>
      <p>Model
XLM-R (TL)
BERT-m (TL)
XLM-R
BERT-m
Random Forest
MLP
SVC</p>
      <p>As can be seen in Table 1, transformer models outperformed the traditional machine learning
algorithms. Between the two transformer models, XLM-R performed better than BERT-m.
Furthermore, it is clear that transfer learning boosted the results of the transformer models.
Our best model was when transfer learning was performed from Hindi with the XLM-R model.
According to the results provided by the organisers, our best model scored a 0.91 macro F1 score
on the test set and ranked 1 out of 25 participants.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Error Analysis</title>
      <p>To deepen our understanding of the limitations of our model, we performed an error analysis
on the model’s results. We compared the predicted labels to the actual labels in multiple models.
There were instances where the models falsely predicted the ‘HOF’ label and instances where
they falsely predicted the ‘NOT’ label. There were some ofensive words for which the tweets
were almost consistently predicted as non-ofensive when the tweets were, in fact, ofensive.
There are cases where one word has two diferent meanings. When we use this word with two
sets of words, the meaning changes significantly. Two such words are ‘ गोट्या’ (balls) and ‘तोंडात’
(inside the mouth). While we see that the phrase ‘गोट्य़ा तों’ात घेया’ means ‘eat my balls’, the
phrase ‘गोट्या खेळणे’ is essentially a slang for ‘not doing a single thing’, but with the right context
it can also mean ‘playing with marbles’. The models have trouble diferentiating between such
ambiguity where one word can be the same in three very diferent kinds of statements. False
positives for the cases in the figure above have primarily been seen in experiments like the
Mulit-Layer Perceptron classifier and the XLM-R model. There are also other instances which
the model misses the actual prediction (HOF) and wrongly predicts the text as ‘NOT’. The
ofensive words in the majority of these tweets are relatively lightly ofensive as compared to
the others. The Random Forest Classifier and the SVC generally falsely predicted these tweets
as ‘NOT’.</p>
      <p>Some other examples of relatively light ofensive words where the models have a hit or miss
performance are- ‘मंद’’ which means( ‘dumb or stupid’), ‘गद्दार’’, which means ‘traitor’ and
‘पािकस्तानी’’, which means ‘Pakistani’. The words, such as ‘Pakistani’, would not be termed as
ofensive without context, which is the tension between the two countries in this case. This
leads to a lot of ofensive tweets having this word, basically saying ‘Pakistani man’, intended to
be an insult at someone who is supposedly talking about harming national security or having
said anti-national statements. The models tend to generate a false negative in such cases. It
is clear from our learning that the models sufer in cases of ambiguity in meanings and also
contextual inference. Hence, such cases need to be handled in a proper way.</p>
    </sec>
    <sec id="sec-6">
      <title>7. Conclusion</title>
      <p>In this paper we have presented the system submitted by the WLV-RIT team to the HASOC 2021
- Hate Speech and Ofensive Content Identification in Marathi at FIRE 2021. We have shown that
XLM-R with transfer learning from a closely related language is the most successful transformer
model from several transformer models we experimented. We also experimented with a couple
of traditional machine learning algorithms. However, the results show that transformer models
comfortably outperformed these traditional machine learning models. Our best system, based
on XLM-R and transfer learning from Hindi, ranked 1 place out of 25 participants in Marathi.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We would like to thank the HASOC organizers for running this interesting shared task and for
replying promptly to all our inquiries.
[7] N. DePaula, K. J. Fietkiewicz, T. J. Froehlich, A. Million, I. Dorsch, A. Ilhan, Challenges for
social media: Misinformation, free speech, civic engagement, and data regulations., in:
Proceedings ASIS&amp;T, 2018.
[8] G. De Gregorio, N. Stremlau, Information interventions and social media, Internet Policy</p>
      <p>Review 10 (2021).
[9] A. Akins, Facebook’s oversight board overrules 4 hate speech, misinformation takedowns,</p>
      <p>SNL Kagan Media and Communications Report (2021).
[10] J. Salminen, H. Almerekhi, M. Milenkovic, S.-g. Jung, A. Jisun, H. Kwak, B. J. Jansen,
Anatomy of online hate: Developing a taxonomy and machine learning models for
identifying and classifying hate in online news media, in: Proceedings of ICWSM, 2018.
[11] K. Nugroho, E. Noersasongko, Purwanto, Muljono, A. Z. Fanani, Afandy, R. S. Basuki,
Improving random forest method to detect hatespeech and ofensive word, in: Proceedings
of ICOIACT, 2019.
[12] J.-M. Xu, K.-S. Jun, X. Zhu, A. Bellmore, Learning from bullying traces in social media, in:</p>
      <p>Proceedings of NAACL, 2012.
[13] M. Dadvar, D. Trieschnigg, R. Ordelman, F. de Jong, Improving cyberbullying detection
with user context, in: Proceedings of ECIR, 2013.
[14] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, Y. Chang, Abusive language detection in
online user content, in: Proceedings of WWW, 2016.
[15] K. Dinakar, R. Reichart, H. Lieberman, Modeling the detection of textual cyberbullying, in:</p>
      <p>Proceedings of ICWSM, 2011.
[16] Y. Chen, Y. Zhou, S. Zhu, H. Xu, Detecting ofensive language in social media to protect
adolescent online safety, in: Proceedings of ASE, 2012.
[17] I. Kwok, Y. Wang, Locate the hate: Detecting tweets against blacks, in: Proceedings of</p>
      <p>AAAI, 2013.
[18] B. Wang, Y. Ding, S. Liu, X. Zhou, YNU_Wb at HASOC 2019: Ordered Neurons LSTM with
Attention for Identifying Hate Speech and Ofensive Language, in: Proceedings of FIRE,
2019.
[19] M. A. Bashar, R. Nayak, QutNocturnal at HASOC 2019: CNN for Hate Speech and Ofensive</p>
      <p>Content Identification in Hindi Language, in: Proceedings of FIRE, 2019.
[20] P. Saha, B. Mathew, P. Goyal, A. Mukherjee, Hatemonitors: Language agnostic abuse
detection in social media, in: Proceedings of FIRE, 2019.
[21] H. Hettiarachchi, T. Ranasinghe, Emoji powered capsule network to detect type and target
of ofensive posts in social media, in: Proceedings of RANLP, 2019.
[22] R. Kumar, B. Lahiri, A. K. Ojha, A. Bansal, ComMA at HASOC 2020: Exploring Multilingual</p>
      <p>Joint Training across diferent Classification Tasks, in: Proceedings of FIRE, 2020.
[23] T. Ranasinghe, M. Zampieri, H. Hettiarachchi, BRUMS at HASOC 2019: Deep Learning
Models for Multilingual Hate Speech and Ofensive Language Identification, in:
Proceedings of FIRE, 2019.
[24] D. Sarkar, M. Zampieri, T. Ranasinghe, A. Ororbia, fBERT: A Neural Transformer for</p>
      <p>Identifying Ofensive Content, in: Proceedings of EMNLP Findings, 2021.
[25] H. Mubarak, D. Kareem, M. Walid, Abusive language detection on Arabic social media, in:</p>
      <p>Proceedings of ALW, 2017.
[26] Z. Pitenis, M. Zampieri, T. Ranasinghe, Ofensive Language Identification in Greek, in:</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mandlia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>Overview of the hasoc track at fire 2019: Hate speech and ofensive content identification in indo-european languages</article-title>
          ,
          <source>in: Proceedings of FIRE</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Kumar</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Overview of the hasoc track at ifre 2020: Hate speech and ofensive language identification in tamil, malayalam, hindi, english and german</article-title>
          ,
          <source>in: Proceedings of FIRE</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bannink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Broeren</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. M. van de Looij-Jansen</surname>
          </string-name>
          , F. G. de Waart, H. Raat,
          <article-title>Cyber and Traditional Bullying Victimization as a Risk Factor for Mental Health Problems and Suicidal Ideation in Adolescents</article-title>
          ,
          <source>PloS one 9</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>John</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Glendenning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marchant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Montgomery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lloyd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hawton</surname>
          </string-name>
          ,
          <article-title>Self-harm, suicidal behaviours, and cyberbullying in children and young people: Systematic review</article-title>
          ,
          <source>Journal of Medical Internet Research</source>
          <volume>20</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Burnap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Javed</surname>
          </string-name>
          , H. Liu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ozalp</surname>
          </string-name>
          ,
          <article-title>Hate in the Machine: Anti-Black and Anti-Muslim Social Media Posts as Predictors of Ofline Racially and Religiously Aggravated Crime</article-title>
          ,
          <source>The British Journal of Criminology</source>
          <volume>60</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Castaño-Pulgarín</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Suárez-Betancur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M. T.</given-names>
            <surname>Vega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M. H.</given-names>
            <surname>López</surname>
          </string-name>
          ,
          <article-title>Internet, social media and online hate speech</article-title>
          .
          <source>systematic review</source>
          ,
          <source>Aggression and Violent Behavior</source>
          <volume>58</volume>
          (
          <year>2021</year>
          ).
          <source>Proceedings of LREC</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Carmona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Guzmán-Falcón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Villaseñor-Pineda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Reyes-Meza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rico-Sulayes</surname>
          </string-name>
          ,
          <article-title>Overview of mex-a3t at ibereval 2018: Authorship and aggressiveness analysis in mexican spanish tweets</article-title>
          ,
          <source>in: Proceedings of IberEval</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Ojha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Benchmarking Aggression Identification in Social Media,
          <source>in: Proceedings of TRAC</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ruppenhofer</surname>
          </string-name>
          ,
          <article-title>Overview of the GermEval 2018 shared task on the identification of ofensive language</article-title>
          ,
          <source>in: Proceedings of GermEval</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [30]
          <string-name>
            <surname>J. M. Struß</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Siegel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ruppenhofer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wiegand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Klenner, Overview of germeval task 2, 2019 shared task on the identification of ofensive language</article-title>
          ,
          <source>in: Proceedings of GermEval</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaikwad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Homan</surname>
          </string-name>
          ,
          <article-title>Cross-lingual ofensive language identification for low resource languages: The case of marathi</article-title>
          ,
          <source>in: Proceedings of RANLP</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          ,
          <source>in: Proceedings of NAACL</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          ,
          <source>in: Proceedings of ACL</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Farra</surname>
          </string-name>
          , R. Kumar, SemEval
          <article-title>-2019 Task 6: Identifying and Categorizing Ofensive Language in Social Media (OfensEval)</article-title>
          ,
          <source>in: Proceedings of SemEval</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          , G. Karadzhov,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z</surname>
          </string-name>
          . Pitenis, c. Çöltekin, SemEval-2020
          <source>Task</source>
          <volume>12</volume>
          :
          <article-title>Multilingual Ofensive Language Identification in Social Media (OfensEval 2020)</article-title>
          , in: Proceedings of SemEval,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Ojha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <article-title>Evaluating aggression identification in social media</article-title>
          ,
          <source>in: Proceedings of TRAC</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M. R.</given-names>
            <surname>Pardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , Semeval
          <article-title>-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter</article-title>
          ,
          <source>in: Proceedings of SemEval</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <article-title>Challenges in Discriminating Profanity from Hate Speech</article-title>
          ,
          <source>Journal of Experimental &amp; Theoretical Artificial Intelligence</source>
          <volume>30</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>H.</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Coheur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paulino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Simão</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Trancoso</surname>
          </string-name>
          ,
          <article-title>Automatic cyberbullying detection: A systematic review</article-title>
          ,
          <source>Computers in Human Behavior</source>
          <volume>93</volume>
          (
          <year>2019</year>
          )
          <fpage>333</fpage>
          -
          <lpage>345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saumya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , IIITDWD at HASOC 2020:
          <article-title>Identifying ofensive content in multitask Indo-European languages</article-title>
          ,
          <source>in: Proceedings of FIRE</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>R.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , S. Saumya, NSIT and IIITDWD at HASOC 2020:
          <article-title>Deep learning model for hate-speech identification in Indo-European languages</article-title>
          ,
          <source>in: Proceedings of FIRE</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>YNU</surname>
          </string-name>
          _OXZ at HASOC 2020:
          <article-title>Multilingual Hate Speech and Ofensive Content Identification based on XLM-RoBERTa</article-title>
          ,
          <source>in: Proceedings of FIRE</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [43]
          <string-name>
            <surname>Modha</surname>
          </string-name>
          , Sandip and Mandl, Thomas and Shahi, Gautam Kishore and
          <article-title>Madhu, Hiren and Satapara, Shrey and Ranasinghe, Tharindu and Zampieri, Marcos, Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Ofensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech</article-title>
          , in: FIRE 2021:
          <article-title>Forum for Information Retrieval Evaluation, Virtual Event</article-title>
          ,
          <fpage>13th</fpage>
          -17th
          <source>December</source>
          <year>2021</year>
          , ACM,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Ofensive Content Identification in English and Indo-Aryan Languages</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>F.</given-names>
            <surname>Murtagh</surname>
          </string-name>
          ,
          <article-title>Multilayer perceptrons for classification and regression</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>2</volume>
          (
          <year>1991</year>
          )
          <fpage>183</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          ,
          <article-title>Support-vector networks</article-title>
          ,
          <source>Machine learning 20</source>
          (
          <year>1995</year>
          )
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>A.</given-names>
            <surname>Liaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiener</surname>
          </string-name>
          , et al.,
          <article-title>Classification and regression by randomforest</article-title>
          ,
          <source>R news 2</source>
          (
          <year>2002</year>
          )
          <fpage>18</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          , et al.,
          <article-title>Scikit-learn: Machine learning in python</article-title>
          ,
          <source>the Journal of machine Learning research 12</source>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <article-title>Multilingual Ofensive Language Identification with Crosslingual Embeddings</article-title>
          ,
          <source>in: Proceedings of EMNLP</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Katiyar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Artzi</surname>
          </string-name>
          ,
          <article-title>Revisiting few-sample bert ifne-tuning</article-title>
          ,
          <source>in: Proceedings of ICLR</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hettiarachchi</surname>
          </string-name>
          , T. Ranasinghe, Infominer at wnut
          <article-title>-2020 task 2: Transformer-based covid-19 informative tweet extraction</article-title>
          , in: Proceedings of W-NUT,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <article-title>Multilingual Ofensive Language Identification for Lowresource Languages</article-title>
          ,
          <source>ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <article-title>An evaluation of multilingual ofensive language identification methods for the languages of india</article-title>
          ,
          <source>Information</source>
          <volume>12</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Le</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art natural language processing</article-title>
          ,
          <source>in: Proceedings of EMNLP</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>