<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detect Hate and Ofensive Content in English and Indo-Aryan Languages based on Transformer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yongyi Kui</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Institute of Yunnan University</institution>
          ,
          <addr-line>Yunnan, China, 650504</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes my submission to the Subtask 1A and Subtask 1B tasks of the HASOC (2021) Hate Speech and Ofensive Content Identification Challenge. In the experiment, I applied diferent pre-training and common neural network models for this task and integrated them. According to the oficial evaluation results, the test results of the solution proposed in this article are ranked fourteenth and fifteenth on English Subtask A and English Subtask B, the rankings on Hindi Subtask A and Hindi Subtask B are sixth and fifth, respectively, and Marathi Subtask A is ranked eleventh. The source code for the evaluated models in this paper is shared openly.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Text Classification</kwd>
        <kwd>Hate and Ofensive Content Analysis</kwd>
        <kwd>pre-trained model</kwd>
        <kwd>Transformers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, ofensive language on social media platforms has surged. Because the Internet
has a certain degree of anonymity, people are more likely to publish hate speech [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] on online
platforms than in reality.
      </p>
      <p>Hate speech will bring challenges to social civilization and harmony. Similarly, insulting
ofensive speech will lead to the radicalization of communication. Therefore, it is necessary to
ifnd an appropriate way to automatically recognize such content to enhance the public opinion
environment of social media. Human beings are more sensitive to hate speech and ofensive
content, so people can easily identify such speech. However, the computer can only detect
whether the text is hateful or ofensive after learning via unsupervised, self-supervised, or
supervised methods that are based on large amount of data.</p>
      <p>
        In the challenges of HASOC 2019 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and HASOC 2020 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], there is a task of identifying hate
speech and ofensive content in English and Hindi. In addition, in 2019, SemEval [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] has a task to
identify the ofensive and non-aggressiveness of English tweets. They use convolutional neural
networks and the BERT model to solve them. SemEval proposed a task called OfensEval 2020
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in 2020, to identify ofensive content in multiple languages, including English. In order to
identify ofensive content, Risch et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] used a BERT model with distinct random seeds, while
Subhanshu et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] fine-tuned the BERT-based network model. Many existing text content
recognition systems are based on Transformer [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] models.
      </p>
      <p>The rest of the paper is structured as follows: In the second part of the paper, we give an
overview of the tasks and datasets of this challenge; the third part describes the models used in
this challenge; the fourth part describes the experimental process of Subtask A and Subtask B;
in the fifth part, the oficial evaluation results of these two tasks are listed. In the last part, I
summarized the evaluation results and the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task and Data Description</title>
      <sec id="sec-2-1">
        <title>2.1. Subtasks</title>
        <p>
          HASOC (2021) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] Subtask 1 includes two subtasks, Subtask A and Subtask B. The main purpose
is to detect the Hate Speech and Ofensive Content of the text. The two subtasks are defined as
follows:
        </p>
        <p>Subtask A: Tweets predicted as hate speech and ofensive speech are further divided into
hate speech, ofensive speech, and profane content. Therefore, it is a multiclass classification
problem.</p>
        <p>Subtask B: Tweets predicted to be hate speech and ofensive speech in English and Hindi
corpus are further identified into three categories: Hate speech, Ofensive, and Profane. Subtask
B is a text multi-classification task.</p>
        <p>The evaluation standards of the prediction results of Subtask A and Subtask B are both Macro
F1 and Macro Precision.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Dataset</title>
        <p>
          The data for this challenge comes from comments on the Twitter platform, the corpus involves
Marathi data [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], English and Hindi data . The Subtask A and Subtask B tasks of the HASOC
(2021) Hate Speech and Ofensive Content Identification Challenge [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] provide training datasets
and testing datasets.
        </p>
        <p>In order to get more data to train the model better, to make the model have a better
generalization and reduce the risk of overfitting, I collected data on English and Hindi corpus in
HASOC (2019) and HASOC (2020) challenges. After integrating the collected data, the amount
of training data about English corpus and Hindi corpus is 12035 and 10215 respectively. Next,
we used the shufle function in the Sklearn package to shufle the order of the data and finally
divided the integrated data into Training Dataset and Validation Dataset at a ratio of 4/1. Table
1 specifically lists the data volume of the three languages in Subtask A and Subtask B.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Data pre-process</title>
        <p>The datasets of Subtask A and Subtask B are sampled from Twitter, and the format of the data
is informal. Tweets can be long or short, and there are a lot of emojis or URL links in the text,
and even spelling errors in words.</p>
        <p>Pre-processing step is applied on the data to make the model better extract the information
carried by the text and help enhance the accuracy of the classifier. In this challenge, some of the
methods we used include: deleting URL links in the text, deleting emoticons and punctuation
marks in the text.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. System Description</title>
      <sec id="sec-3-1">
        <title>3.1. Pre-trained Model</title>
        <p>In this hate speech and ofensive content detection challenge, I tried to use six pre-trained
models: BERT, ALBERT, multilingual BERT (mBERT), DeBERTa, XLNet, and SqueezeBERT.
Here is a brief introduction to each pre-trained model.</p>
        <p>
          The BERT [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] model is a Deep Bidirectional model trained on the Transformer Encoder
structure. The training process is divided into the pre-training stage and Fine-tuning stage, and
its pre-training tasks include Masked LM (Language Model) and Next Sentence Prediction.
        </p>
        <p>
          ALBERT [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] uses word embedding parameter factorization and hidden layer parameter
sharing methods to reduce the amount of model parameters, and uses Sentence Order Prediction
Loss to optimize Next Sentence Prediction Loss. Therefore, compared with the BERT model, it
significantly reduces the amount of model parameters, while the performance loss is tiny.
        </p>
        <p>mBERT is a Cross-language [14] model. It performs cross-language pre-training on data in
104 languages including English, Hindi, and Marathi. Texts in diferent languages share some
common word blocks or vocabulary (such as numbers, links, etc.).</p>
        <p>DeBERTa [15] model uses two methods to enhance BERT. The first is Disentangled Attention,
and in this way, each word uses two vectors to encode the text and position respectively, and
the attention weights between words are calculated separately by using a matrix of text and
relative positions; the second technique is to introduce absolute positions in the Decode Layer
to predict masked tokens.</p>
        <p>Compared with the Masked of the BERT model, XLNet [16] introduces the new pre-training
target of the Permutation Language Model during pre-training. In addition, XLNet introduces
the Transformer XL mechanism, so it has an advantage over the Bert model for tasks where the
input is a long text.</p>
        <p>The SqueezeBERT [17] model applies experience in the computer vision field to the Natural
Language Processing tasks. The SqueezeBERT model replaces several operations in self-attention
layers with grouped convolutions. It replaces several operations in self-attention layers with
grouped convolutions. This model has achieved high accuracy on the GLUE Dataset.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Common neural network</title>
        <p>In this part, I will give a brief overview of several common neural networks used in the paper.</p>
        <p>A Recurrent Neural Network (RNN) [18] is a neural network that can be used to process
sequence data. The RNN model has a memory function, it can remember important words in
the text.</p>
        <p>Long Short-Term Memory (LSTM) [19] has the same memory function as RNN. LSTM uses a
gate mechanism, so it can solve the problem of gradient disappearance to a certain extent. In
this paper, the Bidirectional LSTM (BiLSTM) model is used, which can extract the contextual
information of the text.</p>
        <p>TextCNN [20] is a text classification model using convolutional networks. It passes the word
vector through convolution and pooling operations, and finally sends the output to the softmax
function to achieve classification. The structure of the TextCNN model is relatively simple, with
fewer parameters, and good results can be achieved by introducing Pre-trained word vectors.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Integrated</title>
        <p>In the integration process, we did not freeze the data initialized by the pre-trained model, but
integrated the pre-trained model with a smaller-scale model (RNN, LSTM, or TextCNN) for
training.</p>
        <p>The pre-trained model uses a lot of data for training, it can get high-quality word and sentence
embedding vectors. Therefore, we plan to add models such as RNN, BiLSTM, or TextCNN to the
output layer of the pre-trained model to further extract high-dimensional features. Next, the
results obtained by these relatively small-scale Neural Networks are sent to the Fully Connected
Neural Network for classification. From our subsequent experimental results, we can see that
the accuracy of this method is similar to that of the pre-trained model, but the result of this
integration strategy increases the Macro F1 value in the oficial evaluation score by 0.3% to 1%.
In fact, many downstream tasks have been achieved in this way.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Subtask A Parameters Setting</title>
        <p>In Subtask A, the optimizer selects AdamW; loss function uses Crossentropy; the epoch, max
length, and batch size parameters are set to 10, 96, and 32 respectively; drop_out, learning rate,
and weight_decay parameters are set to 0.4, 1e-5, and 1e-2, respectively.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Subtask A</title>
        <p>First, I use six pre-trained models to conduct a text binary classification experiment under the
parameters set above. These pre-trained models are BERT, ALBERT, mBERT, DeBERTa, XLNet,
SqueezeBERT. Table 2 lists the specific performance of each pre-trained model on the respective
validation datasets of the three language corpora.</p>
        <p>The experimental results show that among the six pre-trained models, the DeBERTa model
performs best on English corpus, and the mBERT model achieves the highest accuracy on the
text binary classification task of Hindi and Marathi corpus. In the classification experiment
of Subtask A, the six models all use a learning rate of 1e-5 in the training phase. Next, I use
the common learning rates of 1e-6, 5e-6, 1e-5, 3e-5, and 5e-5 to train the DeBERTa+ BiLSTM
model separately, and ensure that the remaining parameters remain unchanged. Table 3 lists
the scores of the models trained with these five learning rates on the validation Dataset.</p>
        <p>The results show that the DeBERTa+ BiLSTM model uses a 5e-5 learning rate to obtain the
best performance on the Dataset of this challenge. So in the subsequent experiments of Subtask
A and Subtask B, I chose to use the 5e-5 learning rate.</p>
        <p>Next, I integrated the 4-layer RNN, 2-layer BiLSTM, and TextCNN models after the pre-trained
model with the highest score in the three-language verification data set for experiments. During
this experiment, the learning rate is set to 5e-5, and the other parameters are unchanged. Finally,
the output results of the RNN, BiLSTM, or TextCNN model are sent to the fully connected layer
for classification. The final models are obtained after DeBERTa and mBERT integrate RNN,
BiLSTM or TextCNN models. Table 4,5 list the scores of the final models on their respective
Validation Dataset.</p>
        <p>From the experimental results listed in Table 4, it can be seen that the DeBERTa+ BiLSTM
model performs best on the Validation Dataset of the English corpus, and the accuracy and
Macro F1 score are improved compared to the DeBERTa model alone. Therefore, I use the
prediction result of the DeBERTa+ BiLSTM model as the final submission result on the English
Subtask A task. Similarly, it can be seen from Table 5 that the mBERT+ TextCNN model has
the best performance on the Validation datasets of Hindi and Marathi corpus. So, I use the
prediction results of the mBERT+ TextCNN model on the Hindi and Marathi test data sets as
the final answer to the Hindi Subtask A and Marathi Subtask A tasks.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Subtask B</title>
        <p>The experimental results of Subtask A show that the DeBERTa+ BiLSTM and mBERT+ TextCNN
models perform best on the Validation datasets of English and Hindi corpus, respectively.
Therefore, I still use these two models on the English and Hindi corpus in Subtask B.</p>
        <p>The diference from Subtask A is that the fully connected layer of Subtask B outputs a matrix
of batch_size * 4, while Subtask A outputs a matrix of batch_size * 2. Tables 6,7 respectively list
the scores of the DeBERTa+ BiLSTM and mBERT+ TextCNN models on the Validation datasets
of English Subtask B and Hindi Subtask B.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>On Subtask A and Subtask B, among all teams, the solutions I put forward in the paper are
ranked fourteenth and fifteenth on the English corpus, ranked sixth and fifth on the Hindi
corpus, and ranked eleventh in the Marathi corpus. Table 8 lists the best scores on the corpus
of each language in Subtask A and Subtask B in the HASOC (2021) Challenge, as well as the
oficial evaluation results of the answers I finally submitted.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, I describe the solution I proposed in the HASOC (2021) challenge, including the
pre-preprocessing of the data before training the model, the selection of the learning rate, and
the construction of the final model. This challenge mainly includes the text classification tasks of
English, Hindi, and Marathi. I solved the classification problem of English corpus by integrating
the DeBERTa and BiLSTM models, and the classification problem of Hindi and Marathi corpus
was solved by integrating the mBERT and TextCNN models.</p>
      <p>The dificulties of Subtask A and Subtask B in this challenge are as follows. First of all, the
text content is informal, the text length is generally short, and it lacks context, so it is dificult to
obtain very high accuracy. Secondly, during the experiment, I found that the data distribution of
each category in Subtask A and Subtask B is not uniform, which is also a reason why the model
is biased to predict a category that appears more commonly. Finally, the amount of data in
this challenge is not suficient compared to large models like BERT, which leads to over-fitting,
which makes the model perform poorly on the testing Dataset. In future research work, I will
try to use a variety of fine-tuning strategies [ 21], or the idea of transfer learning [22] to continue
to improve my solution.
(2019).
[14] Z. Chi, L. Dong, F. Wei, N. Yang, S. Singhal, W. Wang, X. Song, X.-L. Mao, H. Huang,
M. Zhou, Infoxlm: An information-theoretic framework for cross-lingual language model
pre-training, arXiv preprint arXiv:2007.07834 (2020).
[15] P. He, X. Liu, J. Gao, W. Chen, Deberta: Decoding-enhanced bert with disentangled
attention, arXiv preprint arXiv:2006.03654 (2020).
[16] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, Q. V. Le, Xlnet: Generalized
autoregressive pretraining for language understanding, Advances in neural information
processing systems 32 (2019).
[17] F. N. Iandola, A. E. Shaw, R. Krishna, K. W. Keutzer, Squeezebert: What can computer
vision teach nlp about eficient neural networks?, arXiv preprint arXiv:2006.11316 (2020).
[18] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y.
Bengio, Learning phrase representations using rnn encoder-decoder for statistical machine
translation, arXiv preprint arXiv:1406.1078 (2014).
[19] K. Gref, R. K. Srivastava, J. Koutník, B. R. Steunebrink, J. Schmidhuber, Lstm: A search
space odyssey, IEEE transactions on neural networks and learning systems 28 (2016)
2222–2232.
[20] T. Lei, R. Barzilay, T. Jaakkola, Molding cnns for text: non-linear, non-consecutive
convolutions, arXiv preprint arXiv:1508.04112 (2015).
[21] S. Li, J. Wang, W. Xiang, K. Yang, Z. Li, W. Wang, An autoregulated fine-tuning strategy
for titer improvement of secondary metabolites using native promoters in streptomyces,
ACS synthetic biology 7 (2018) 522–530.
[22] N. Barhate, S. Bhave, R. Bhise, R. G. Sutar, D. C. Karia, Reducing overfitting in diabetic
retinopathy detection using transfer learning, in: 2020 IEEE 5th International Conference
on Computing Communication and Automation (ICCCA), IEEE, 2020, pp. 298–301.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <article-title>A survey on automatic detection of hate speech in text, ACM Computing Surveys (CSUR) 51 (</article-title>
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mandlia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>Overview of the hasoc track at fire 2019: Hate speech and ofensive content identification in indo-european languages</article-title>
          ,
          <source>in: Proceedings of the 11th forum for information retrieval evaluation</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <source>Overview of the HASOC track at FIRE</source>
          <year>2020</year>
          :
          <article-title>Hate speech and ofensive content identification in indo-european languages</article-title>
          ,
          <source>CoRR abs/2108</source>
          .05927 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2108.05927. arXiv:
          <volume>2108</volume>
          .
          <fpage>05927</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Farra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , Semeval
          <article-title>-2019 task 6: Identifying and categorizing ofensive language in social media (ofenseval</article-title>
          ), arXiv preprint arXiv:
          <year>1903</year>
          .
          <volume>08983</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          , G. Karadzhov,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitenis</surname>
          </string-name>
          , Ç. Çöltekin, Semeval-2020 task 12:
          <article-title>Multilingual ofensive language identification in social media</article-title>
          (ofenseval
          <year>2020</year>
          ), arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>07235</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Risch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krestel</surname>
          </string-name>
          ,
          <article-title>Bagging bert models for robust aggression identification</article-title>
          ,
          <source>in: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          , 3idiots at hasoc 2019:
          <article-title>Fine-tuning transformer neural networks for hate speech identification in indo-european languages</article-title>
          .,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>208</fpage>
          -
          <lpage>213</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Chowdhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdelali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Darwish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Soon-Gyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Salminen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <article-title>Improving arabic text categorization using transformer training diversification</article-title>
          ,
          <source>in: Proceedings of the Fifth Arabic Natural Language Processing Workshop</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>226</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Ofensive Content Identification in English and Indo-Aryan Languages</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          . URL: http://ceur-ws.org/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaikwad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Homan</surname>
          </string-name>
          ,
          <article-title>Cross-lingual ofensive language identification for low resource languages: The case of marathi</article-title>
          ,
          <source>in: Proceedings of RANLP</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zampieri, Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Ofensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech</article-title>
          , in: FIRE 2021:
          <article-title>Forum for Information Retrieval Evaluation, Virtual Event</article-title>
          ,
          <fpage>13th</fpage>
          -17th
          <source>December</source>
          <year>2021</year>
          , ACM,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , R. Soricut,
          <string-name>
            <surname>Albert:</surname>
          </string-name>
          <article-title>A lite bert for self-supervised learning of language representations</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .11942
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>