<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Conversational Hate Speech and Ofensive Content Identification in Code-Mixed Languages using Fine-Tuned Multilingual Embedding</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Supriya Chanda</string-name>
          <email>supriyachanda.rs.cse18@itbhu.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sacchit D Sheth</string-name>
          <email>sacchit.dsheth.cd.cse19@itbhu.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sukomal Pal</string-name>
          <email>spal.cse@itbhu.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>@M_Ziesmann Der Vergleich mit Hannibal Lecter alleine macht dich zur lebenden Legende!!! Ich lache tr0̆0e4nen :D recht du damit einfach hast! Dieses ẄesenL̈ auterbach, geh0̆0f6rt einfach f0̆0fcr immer in eine Gummizelle gesperrt mit 18h Zwangsjacke pro Tag @Superutz @hartmann_torben @PrienKarin @Karl_Lauterbach @OlafScholz @_FriedrichMerz @RobertHabeckMdB</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science and Engineering, Indian Institute of Technology (BHU)</institution>
          ,
          <addr-line>Varanasi, INDIA, 221005</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Hate Speech</institution>
          ,
          <addr-line>Ofensive Language, Social Media, Marathi, Code-Mixed, Multilingual BERT, GermanBERT</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>9</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>We are seeing an increase in hateful and ofensive tweets and comments on social media platforms like Facebook and Twitter, impacting our social lives. Because of this, there is an increasing need to identify online postings that can violate accepted norms. For resource-rich languages like English, the challenge of identifying hateful and ofensive posts has been well investigated. However, it remains unexplored for languages with limited resources like Marathi. Code-mixing frequently occurs in the social media sphere. Therefore identification of conversational hate and ofensive posts and comments in Code-Mixed languages is also challenging and unexplored. In three diferent objectives of the HASOC 2022 shared task, we proposed approaches for recognizing ofensive language on Twitter in Marathi and two code-mixed languages (i.e., Hinglish and German). Some tasks can be expressed as binary classification (also known as coarse-grained, which entails categorizing hate and ofensive tweets as either present or absent). At the same time, others can be expressed as multi-class classification (also known as fine-grained, where we must further categorize hate and ofensive tweets as Standalone Hate or Contextual Hate). We concatenate the parent-comment-reply data set to create a dataset with additional context. We use the multilingual bidirectional encoder representations of the transformer (mBERT), which has been pre-trained to acquire the contextual representations of tweets. We have carried out several trials using various pre-processing methods and pre-trained models. Finally, the highest-scoring models were used for our submissions in the competition, which ranked our team (irlab@iitbhu) second out of 14, seventh out of 11, sixth out of 10, fourth out of 7, and fith out of six for the ICHCL task 1, ICHCL task 2, Marathi subtask 3A, subtask 3B and subtask 3C respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>Multilingual Embedding</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Hinglish,</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Over the last several years, the number of people using social media platforms and online
forums has skyrocketed. Every day, around 500 million tweets are sent[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Unfortunately, the
boom in social media usage has also resulted in an increase in hate speech and cyberbullying.
Despite social media’s various applications, there is one drawback, those with malicious intent
view it as a chance to spread harsh thoughts to a larger audience. As a result, we must take
large-scale measures to combat this bad information.
      </p>
      <p>People today use social media to express their thoughts on a variety of issues. It allows people
to express themselves and see the world through new eyes. This, however, empowers people to
communicate whatever they choose, even if it is disrespectful or damaging to others. Social
media’s rapid expansion has transformed communication and content creation. Most young
people use it for news consumption and social connection.</p>
      <p>
        Hate speech is a menace to social culture and peace. Similarly, ofensive speech may result
in communicative radicalization. Automatic suppression measures are required to safeguard
persons of all ages from being exposed to hate speech. Manual moderation is not always
accurate, and there is a steady flow of material entering social media. Because hate speech has a
detrimental influence on public opinion, several platforms, like YouTube, Facebook, and Twitter,
have rules and procedures to filter hate speech content and other harmful behavior. This is an
attempt to mitigate the negative impact that hate speech may have on society.[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
      </p>
      <p>Machine learning and deep learning algorithms have grown significantly for applications
involving natural language interpretation as computer capabilities have advanced. These tools
have a vast potential for detecting and removing malicious content from social media.</p>
      <p>We are interested in detecting hate speech in tweets in this endeavor. This paper primarily
reports on experiments with HASOC 2022 data. We assess several deep learning approaches,
particularly multilingual models. We experimented with numerous fine-tuning strategies to see
how they assist the model in categorizing tweets into a coarse and fine grain.</p>
      <sec id="sec-2-1">
        <title>1.1. HASOC Tasks</title>
        <p>
          The goal of HASOC 2022[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] 1 was to establish a testbed for the automated detection of hate
speech and objectionable material in social media posts. Some tasks can be represented in binary
classification- categorising hate and ofensive tweets as either present or absent (also known as
coarse-grained). Others can be expressed as multi-class classification- further classification of
hate and ofensive tweets into Standalone and Contextual hate (also known as fine-grained).
        </p>
        <p>
          Task 1: ICHCL HINGLISH and GERMAN Codemix Binary Classification [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
•
•
(NOT) Non Hate-Ofensive - Any form of Hate speech, profane, ofensive
content is not present in this post.
        </p>
        <p>(HOF) Hate and Ofensive - Contains Hate, ofensive, and profane words.</p>
        <p>Task 2: Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL)
Multiclass Classification.</p>
        <p>•
•
(SHOF) Standalone Hate - Contains Hate, ofensive, and profane words in
itself.
(CHOF) Contextual Hate - Comment or reply of a tweet supports the hate,
ofense, and profanity expressed in its parent tweet. This includes expressing
apparent hatred and endorsing the hatred with positive sentiment.
1https://hasocfire.github.io/hasoc/2022/call_for_participation.html
•
A)
•
•
B)
C)
•
•
•
•
•
(NONE) Non-Hate - Any form of Hate speech, profane, ofensive content is
not present.</p>
        <p>
          Task 3: Ofensive Language Identification in Marathi- focused on hate speech and ofensive
language identification is ofered for Marathi [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ].
        </p>
        <p>Ofensive Language
DetectionOfensive(OFF) - Contains any form of non-acceptable language
Non Hate-Ofensive(NOT)- No ofense or profanity is present.</p>
        <p>Categorisation of Ofensive
LanguageTargeted Insult (TIN)- An insult or threat to an individual, group, or others.
Untargeted (UNT)- Profanity and targeting that is untargeted.</p>
        <p>Ofense Target
IdentificationIndividual (IND)- Posts targeting an individual.</p>
        <p>Group (GRP)- Posts targeting a group of people.</p>
        <p>Other (OTH)- This target is neither an individual nor a group of people.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>Automated hate and ofensive speech detection on social media platforms is a dificult task, for
which many diferent machine learning and deep learning approaches have been tested. These
include models trained on curated datasets, as well as models that are trained on a corpus of
malicious content.</p>
      <p>
        When using machine learning with hate text, there are several ways that may be done.
Feature extraction is a common technique. Linear Support Vector Machines trained on TF-IDF
feature model has been used[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This procedure may incorporate a collection of words, n-grams,
lexical characteristics, and linguistic features. Word embedding algorithms have recently been
proposed for similar purposes. Using the bag of words approach may result in a large number
of false positives since objectionable terms in a non-hate tweet may be misclassified as hate
speech[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] proposes use of pre-trained word embeddings and max/mean pooling from basic,
fully connected embedding transformations was proposed as a neural-network-based hate
speech categorization solution. However, these techniques fall short of capturing the whole
context of the speech.
      </p>
      <p>
        Deep learning algorithms are becoming increasingly popular in text categorization, sentiment
analysis, language modelling, machine translation, and other fields.Some of these methods
are Convolutional Neural Networks(CNNs)[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Recurrent Neural Networks(RNNs) [12]
[13], Long Short-Term Memory(LSTMs) [14], Bidirectional LSTMs (BiLSTMs) [15] and the most
@Joydas @NSaina 2 rs k liye tweet kiya isne samjho bhai national
hero hogi phle ab to izzat gawa di.. Andhbhakt ban gayi didi
@itsoutrageeyash @NSaina Yet another man telling a woman what
to do and yet another telling her to #shutup. Shame on you Yash!
#Feminism #Mansplaining #shethepeople
No nation can claim itself to be safe if the security of its own PM
gets compromised. I condemn, in the strongest words possible, the
cowardly attack on PM Modi by anarchists.#BharatStandsWithModi
#PMModi NOT NONE
Language
      </p>
      <p>
        Sample tweet from the class
recent is a transformer-based architecture namely Bidirectional Encoder Representations(BERT)
[16][17] and XLM-Roberta[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Because of the scarcity of relevant corpora, the vast majority of studies on abusive language
have focused on English data. To address this there have been studies on other languages like
Spanish[18],French[19], Italian[20], German[21] among others.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Dataset</title>
      <p>The dataset is in the form conversational thread can also contain hate and ofensive content
it which is not apparent just from the single comment or the reply to comment but can be
identified if given the context of the parent content. The figure 1 shows the structure of data.
The corpus collection and class distribution is shown in Table 2.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Methodology</title>
      <sec id="sec-5-1">
        <title>4.1. Preprocessing</title>
        <p>Twitter data are very unstructured and include a lot of noise due to the colloquial character
of Twitter conversations, which might compromise accuracy of processing techniques. As a
result, it was determined that all Tweets should be preprocessed to eliminate less predictive text
elements. To produce the final text sequence, we concatenate the tweet and its comments and
responses, if any are present. Our assumption is that this concatenation will help the model
better comprehend the context, particularly in circumstances when the remark or reply is not
hateful but demonstrates support for the terrible parent tweet.</p>
        <p>• We perform cleaning by removing usernames, punctuation and URLs.
• We use ekphrasis which is a text processing tool, geared towards text from social networks,
such as Twitter or Facebook. ekphrasis performs tokenization, word normalization, word
segmentation (for splitting hashtags) and spell correction, using word statistics from 2
big corpora[22]
• demoji to accurately remove and replace emojis in text strings.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Implementation</title>
        <p>Text classification, in which we must label the dataset, is one of the problems in this data
challenge. Using the given dataset, the models were developed by fine-tuning a pre-trained
language model. We chose German BERT as our pretrained language model due to its recent
success, as it allows working with text data in German to be more eficient with their natural
language processing (NLP) tasks, and XLM-Roberta, which is capable of processing text from
100 separate languages and is trained on significantly more training data than BERT.[ 23]</p>
        <p>In order to classify text more accurately, we concatenate the data in a way that provides more
context to the transformer. The figure 2 shows the used concatenation process.</p>
        <p>We have used pre-trained transformer models from HuggingFace 2 in the implementation.
The Framework for Adapting Representation Models, or FARM, is based on transformers and
includes extra capabilities to make developers’ lives easier. Parallelized preprocessing, highly
modular architecture, multi-task learning, experiment tracking, simple debugging, and close
connection with AWS SageMaker are among the features.</p>
        <p>We use the GermanBERT 3 model, which has been fine-tuned to our standards. Despite
the fact that there are numerous models trained on German data that are accessible from
HuggingFace, we believe that the original model will struggle to grasp the smaller chunks
of words when they are broken up and have thus employed German BERT. We utilised the
PyTorch package for our implementation environment, which supports GPU processing. We
discovered that training our classifier with a batch size of 16 for 5 to 10 epochs and the AdamW
optimizer with a learning rate of 2e-5 worked well by doing tests. In our work, we have used
the transformer-based XLM-RoBERTa model.</p>
        <p>• Perform preprocessing for the concatenated text according to the steps mentioned in</p>
        <p>Section 4.1 .
• Individual entries are indicated with a black dot, a so-called bullet.
• We then proceed by tokenizing the text with the XLM-RoBERTa pre-trained SentencePiece
tokenizer [24].
• The text is then padded and truncated to a maximum sequence length of tokens.
• The model is fine-tuned for hate speech detection with various batch sizes and AdamW
optimizer.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Results and Discussion</title>
      <p>Task-1 of HASOC 2022 was an ICHCL- Binary Classification problem. We provided two
submissions out of which the best  1 score we got is 0.627. We finished sixth (previously second
with  1 score 0.702) out of fourteen teams, and our submission was second out of all the runs (42)
submitted by the teams. GermanBERT scores well in Task 1 for monolingual text categorization
(for German), with an F1 score of 0.702. Despite the fact that several multilingual models are
trained on the same dataset, a specific monolingual model can better comprehend the context
of the supplied language without clutter.</p>
      <p>Task 2 was an ICHCL- Multiclass Classification problem. We provided five submissions, of
which the best  1 score we got was 0.439. We finished seventh out of eleven teams, and our
submission was tenth out of all the runs (26) submitted by the teams. mBERT scores well in
task 2 which was a fine-graind conversational Hate speech and ofensive content identification.</p>
      <p>In Task 3, which was ofensive language identification in Marathi, had three subtasks. For
subtask-A, we provided five submissions, out of which the best  1 score we got is 0.935. We
ifnished sixth out of the ten teams, and our submission was ninth out of all the runs (28)
submitted by the teams. We submitted one run for each Subtask-B Subtask-C, resulting in an
 1 score of 0.535 and 0.289, respectively. In Subtask-B, we finished fourth out of seven teams,
and our submission was eleventh out of all the runs (16) submitted by the teams. In Subtask-C
We finished fith out of six teams, and our submission was fourteenth out of all the runs(15)
submitted by the teams. In Task 3, we can observe that the first submission for subtask-A, which
employed fast.ai, had a higher  1 score than the other entries, which used multilingual models.
The model was fine-tuned, resulting in modest variations ranging from 0.739 to 0.907. Our
model underperformed in the other subtasks, which might be owing to overfitting the model
caused by the unbalanced dataset, albeit we did apply data augmentation.</p>
      <p>For each subtask, we have submitted all the various submissions. Following are descriptions
of each run.</p>
      <p>1. submission-task1-1: We have used mBERT for German-English code-mixed data. The
maximum sequence length was 256 tokens, and the batch size of 32. For Hindi-English
code-mixed data, we have used XLM-Roberta. The maximum sequence length was 512
tokens, and the batch size of 16. (Macro F1: 0.6270)
2. submission-task1-2_t: We have used GermanBERT for German-English code-mixed
data. The maximum sequence length was 128 tokens, and the batch size of 32. For
Hindi-English code-mixed data, we have used XLM-Roberta. The maximum sequence
length was 512 tokens, and the batch size of 16. (Macro F1: 0.6192)
3. submission-task2-1_t: We have used XLM-Roberta for this task. The maximum
sequence length was 512 tokens, the batch size of 16, and the epoch was 10. (Macro F1:
0.439)
4. submission-task2-2: We have used mBERT for this task. The maximum sequence
length was 512 tokens, and the batch size of 16. We have frozen all the parameters of the
pre-trained model and then used early stopping criteria. To avoid class imbalance we
have used ”classweight=balanced.” (Macro F1: 0.307)
5. submission-task2-3: We have used mBERT for this task. The maximum sequence length
was 512 tokens, and the batch size of 16. We have a Focal loss function here. (Macro F1:
0.387)
6. submission-task2-4: We have used XLM-Roberta for this task. The maximum sequence
length was 512 tokens, and the batch size of 16. (Macro F1: 0.392)
7. submission-task2-5: We have used XLM-Roberta for this task. The maximum sequence
length was 512 tokens, the batch size of 32, and the epoch was 2. (Macro F1: 0.6093)
ICHCL
Marathi</p>
      <p>ub-cs</p>
      <p>ssncse-nlp
hate-busters</p>
      <p>satlab
 1
.708
.619
.493
.307
.387
.392
.375
.974
.543
.907
.838
.739
.920
.960</p>
      <p>.712
.623
.630
.521
.553
.371
.394
.419
.392
.975
.935
.543
.909
.846
.776
.911
.483
.936
.285</p>
      <p>.709
.620
.628
491
.443
.397
.396
.406
.395
.974
.935
.543
.907
.840
.748
.934
.610
.989
.421
Evaluation results on test data and rank list (Submission number in bracket)
8. submission-task3a-1: We have used XLM-Roberta for this task. The maximum sequence
length was 128 tokens, the batch size of 16, and the epoch was 4. (Macro F1: 0.935)
9. submission-task3b-1: We have used XLM-Roberta for this task. The maximum sequence
length was 128 tokens, the batch size of 16, and the epoch was 8. (Macro F1: 0.535)
10. submission-task3c-1: We have used XLM-Roberta for this task. The maximum sequence
length was 128 tokens, the batch size of 16, and the epoch was 5. (Macro F1: 0.289)</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion</title>
      <p>In this paper, we have presented the system submitted by the IRLab@IITBHU team to the
HASOC 2022 - Hate Speech and Ofensive Content Identification in English and Indo-Aryan
Languages shared task at FIRE 2022. Our system is based on fine-tuning state of the art
transformer models like XLM Roberta and German BERT to categorize tweets in Marathi and
Hinglish Codemix and German language. Pre-trained bi-directional encoder representations
using transformers outperform the traditional machine learning models.</p>
    </sec>
    <sec id="sec-8">
      <title>7. Acknowledgements</title>
      <p>We are thankful to the organizers of HASOC 2022 for providing the opportunity to work on
this interesting and important task.
[12] G. K. Pitsilis, H. Ramampiaro, H. Langseth, Efective hate-speech detection in twitter data
using recurrent neural networks, Applied Intelligence 48 (2018) 4730–4742.
[13] A. S. Saksesi, M. Nasrun, C. Setianingsih, Analysis text of hate speech detection using
recurrent neural network, in: 2018 International Conference on Control, Electronics,
Renewable Energy and Communications (ICCEREC), IEEE, 2018, pp. 242–248.
[14] S. S. Syam, B. Irawan, C. Setianingsih, Hate speech detection on twitter using long
short-term memory (lstm) method, in: 2019 4th International Conference on Information
Technology, Information Systems and Electrical Engineering (ICITISEE), IEEE, 2019, pp.
305–310.
[15] A. R. Isnain, A. Sihabuddin, Y. Suyanto, Bidirectional long short term memory method
and word2vec extraction approach for hate speech detection, IJCCS (Indonesian Journal
of Computing and Cybernetics Systems) 14 (2020) 169–178.
[16] S. Chanda, S. Ujjwal, S. Das, S. Pal, Fine-tuning pre-trained transformer based model for
hate speech and ofensive content identification in english, indo-aryan and code-mixed
(english-hindi) languages, in: Forum for Information Retrieval Evaluation (Working
Notes)(FIRE), CEUR-WS. org, 2021.
[17] M. Mozafari, R. Farahbakhsh, N. Crespi, A bert-based transfer learning approach for
hate speech detection in online social media, in: International Conference on Complex
Networks and Their Applications, Springer, 2019, pp. 928–940.
[18] F. M. Plaza-del Arco, M. D. Molina-González, L. A. Urena-López, M. T. Martín-Valdivia,
Comparing pre-trained language models for spanish hate speech detection, Expert Systems
with Applications 166 (2021) 114120.
[19] P. Chiril, F. Benamara, V. Moriceau, M. Coulomb-Gully, A. Kumar, Multilingual and
multitarget hate speech detection in tweets, in: Conférence sur le Traitement Automatique
des Langues Naturelles (TALN-PFIA 2019), ATALA, 2019, pp. 351–360.
[20] M. Sanguinetti, F. Poletto, C. Bosco, V. Patti, M. Stranisci, An italian twitter corpus of hate
speech against immigrants, in: Proceedings of the eleventh international conference on
language resources and evaluation (LREC 2018), 2018.
[21] S. Jaki, T. De Smedt, Right-wing german hate speech on twitter: Analysis and automatic
detection, arXiv preprint arXiv:1910.07518 (2019).
[22] C. Baziotis, N. Pelekis, C. Doulkeridis, Datastories at semeval-2017 task 4: Deep lstm
with attention for message-level and topic-based sentiment analysis, in: Proceedings of
the 11th International Workshop on Semantic Evaluation (SemEval-2017), Association for
Computational Linguistics, Vancouver, Canada, 2017, pp. 747–754.
[23] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave,
M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at
scale, arXiv preprint arXiv:1911.02116 (2019).
[24] T. Kudo, J. Richardson, Sentencepiece: A simple and language independent subword
tokenizer and detokenizer for neural text processing, arXiv preprint arXiv:1808.06226
(2018).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sarddar</surname>
          </string-name>
          ,
          <article-title>Analyzing political sentiment using twitter data, in: Information and communication technology for intelligent systems</article-title>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>427</fpage>
          -
          <lpage>436</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ali</surname>
          </string-name>
          , U. Farooq,
          <string-name>
            <given-names>U.</given-names>
            <surname>Arshad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shahzad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Beg</surname>
          </string-name>
          ,
          <article-title>Hate speech detection on twitter using transfer learning</article-title>
          ,
          <source>Computer Speech &amp; Language</source>
          <volume>74</volume>
          (
          <year>2022</year>
          )
          <fpage>101365</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , K. North,
          <string-name>
            <given-names>D.</given-names>
            <surname>Premasiri</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Ofensive Content Identification in English and Indo-Aryan Languages</article-title>
          , in: FIRE 2022:
          <article-title>Forum for Information Retrieval Evaluation, Virtual Event</article-title>
          ,
          <fpage>9th</fpage>
          -13th
          <source>December</source>
          <year>2022</year>
          , ACM,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC Subtrack at FIRE 2022: Identification of Conversational Hate-Speech in HindiEnglish Code-Mixed and German Language</article-title>
          , in: Working Notes of FIRE 2022 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chaudhari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaikwad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paygude</surname>
          </string-name>
          ,
          <article-title>Predicting the type and target of ofensive social media posts in marathi</article-title>
          ,
          <source>Social Network Analysis and Mining</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <article-title>77</article-title>
          . URL: https://doi.org/10.1007/s13278-022-00906-8. doi:
          <volume>10</volume>
          . 1007/s13278- 022- 00906- 8.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          , K. North,
          <string-name>
            <given-names>D.</given-names>
            <surname>Premasiri</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zampieri, Overview of the HASOC subtrack at FIRE 2022: Ofensive Language Identification in Marathi</article-title>
          , in: Working Notes of FIRE 2022 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Saroj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          , S. Pal, IRlab@IITV at SemEval-2020 task 12:
          <article-title>Multilingual ofensive language identification in social media using SVM</article-title>
          ,
          <source>in: Proceedings of the Fourteenth Workshop on Semantic Evaluation</source>
          , International Committee for Computational Linguistics,
          <source>Barcelona (online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2012</fpage>
          -
          <lpage>2016</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .semeval-
          <volume>1</volume>
          .
          <fpage>265</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Burnap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making</article-title>
          ,
          <source>Policy &amp; internet 7</source>
          (
          <year>2015</year>
          )
          <fpage>223</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kshirsagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cukuvac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>McKeown</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. McGregor</surname>
          </string-name>
          ,
          <article-title>Predictive embeddings for hate speech detection on twitter</article-title>
          , arXiv preprint arXiv:
          <year>1809</year>
          .
          <volume>10644</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Gambäck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U. K.</given-names>
            <surname>Sikdar</surname>
          </string-name>
          ,
          <article-title>Using convolutional neural networks to classify hate-speech</article-title>
          ,
          <source>in: Proceedings of the first workshop on abusive language online</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tepper</surname>
          </string-name>
          ,
          <article-title>Detecting hate speech on twitter using a convolution-gru based deep neural network</article-title>
          ,
          <source>in: European semantic web conference</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>745</fpage>
          -
          <lpage>760</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>