=Paper= {{Paper |id=Vol-3681/T6-13 |storemode=property |title=Breaking Barriers: Multilingual Toxicity Analysis for Hate Speech and Offensive Language in Low-Resource Indo-Aryan Languages |pdfUrl=https://ceur-ws.org/Vol-3681/T6-13.pdf |volume=Vol-3681 |authors=Ch Muhammad Awais,Jayveersinh Raj |dblpUrl=https://dblp.org/rec/conf/fire/AwaisR23 }} ==Breaking Barriers: Multilingual Toxicity Analysis for Hate Speech and Offensive Language in Low-Resource Indo-Aryan Languages== https://ceur-ws.org/Vol-3681/T6-13.pdf
                                Breaking Barriers: Multilingual Toxicity Analysis for
                                Hate Speech and Offensive Language in
                                Low-Resource Indo-Aryan Languages
                                Ch Muhammad Awais1 , Jayveersinh Raj2
                                1
                                    University of Pisa, Pisa, Italy
                                2
                                    Innopolis University, Innopolis, Russia


                                                                         Abstract
                                                                         In our interconnected digital landscape, tackling the proliferation of hate speech and offensive content
                                                                         across languages is a pressing challenge. This paper presents a pioneering approach, fine-tuning the
                                                                         XLM-RoBERTa model for hate speech detection in low-resource languages, transcending linguistic
                                                                         boundaries. Our work is motivated by our participation in the Hate Speech and Offensive Content
                                                                         Identification (HASOC) competition, where our team, AI Alchemists, achieved 3rd place in Sinhala, and
                                                                         participated in Task 1 and 4 with the provided datasets. Our average position was 4th, with no position
                                                                         less than 6th.
                                                                             Our experiments show that the model can be used to detect hate speech in many different languages,
                                                                         including Sinhala and Bodo, with high accuracy (F1 scores of 0.8345 and 0.844, respectively). It also
                                                                         achieved promising results on Gujarati, Bengali, and Assamese (F1 scores of 0.793, 0.726, and 0.707,
                                                                         respectively).
                                                                             We emphasize that the quality and size of the training dataset are important factors in the performance
                                                                         of the model. With further research and access to more balanced datasets, we believe that our model
                                                                         has the potential to outperform state-of-the-art models and curb hate speech across diverse linguistic
                                                                         landscapes, fostering more inclusive online spaces.

                                                                         Keywords
                                                                         Multilingual Toxicity Analysis, Hate Speech Detection, Low-Resource Languages, Natural Language
                                                                         Processing, XLM-RoBERTa Model, Transfer Learning, Zero-Shot Transfer, Offensive Content, Cross-
                                                                         Lingual Adaptability, Linguistic Barriers




                                1. Introduction
                                In today’s digital age, the widespread proliferation of offensive content has become an urgent
                                societal concern. Hate speech, offensive language, and profanity have found fertile ground
                                on various online platforms, perpetuating hostility and division among users [1]. Effectively
                                addressing this issue requires advanced tools and methodologies capable of identifying and miti-
                                gating offensive content with precision since it is challenging to categorize tweets or comments
                                without obvious hateful keyword [2], and partly because there isn’t a consensus definition of
                                hate speech, examination of its demographic influences, or investigation of its most potent

                                Forum for Information Retrieval Evaluation, December 9-13, 2023, India
                                Envelope-Open c.awais@studenti.unipi.it (C. M. Awais); j.raj@innopolis.university (J. Raj)
                                GLOBE https://cm-awais.github.io/ (C. M. Awais); https://jayveersinh-raj.github.io/ (J. Raj)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
characteristics [3]. Natural Language Processing (NLP) has emerged as a powerful approach in
this endeavor, harnessing the fusion of machine learning and language analysis to discern and
counteract toxic language [4].

   NLP techniques have proven instrumental in analyzing vast volumes of textual data, enabling
the detection of offensive content across diverse linguistic landscapes. By leveraging computa-
tional linguistics and statistical models, NLP algorithms can automatically classify texts into
offensive and non-offensive categories [5]. The ability to aggregate feedback without human
intervention makes this incredibly helpful [6]. These algorithms excel at recognizing patterns,
contextual cues, and linguistic features associated with offensive language, thereby aiding in
the identification and moderation of problematic content [7, 8].
   However, while NLP has demonstrated remarkable promise in addressing offensive content,
its application to Low-Resource Indo-Aryan Languages presents a unique set of challenges. Low-
resource languages, such as Sinhala and Gujarati, often lack the extensive datasets and linguistic
resources available for major languages like English. Consequently, developing effective NLP
models for these languages becomes a complex endeavor, hindered by limited training data
and insufficient linguistic representation [9]. Low-resource languages lack the necessary data
processing and ground truth data requirements for machine learning systems [10].
   The urgency of mitigating offensive content in Low-Resource Indo-Aryan Languages cannot
be understated. With millions of users communicating in these languages on various platforms,
there is a pressing need to safeguard them from the harmful impact of offensive language. To
address this need, we propose a solution based on Multilingual Toxicity Analysis, a cutting-edge
approach that transcends language barriers and facilitates the detection of offensive content
across multiple languages [11].
   In this research, we embark on an experimental journey, aiming to harness the power of
transfer learning and zero-shot transfer techniques. Our goal is to create a model capable
of identifying hate speech and offensive content in Hindi, Sinhala, Gujarati, Bengali, Bodo,
Assamese languages despite being primarily trained on English data through few-shot learning
with relatively smaller datasets. By strategically transferring knowledge from high-resource
languages like English to low-resource languages, we aim to enhance the capabilities of NLP
models in identifying toxic language across linguistic boundaries.
   The subsequent sections of this paper will delve into the methodology employed, including
data collection and preprocessing techniques, the selection and fine-tuning of NLP models, as
well as the evaluation metrics used to assess model performance. We will discuss the insights
drawn from our experiments, shedding light on the model’s adaptability in diverse linguistic
contexts. Additionally, we will explore the implications and limitations of our approach, paving
the way for further advancements in the realm of Multilingual Toxicity Analysis for low-
resource languages. Ultimately, our research strives to contribute to a safer and more inclusive
digital ecosystem by demonstrating that, with strategic fine-tuning, NLP models can transcend
language barriers and effectively detect offensive content. By doing so, we aspire to create online
spaces where offensive content is promptly identified and addressed, fostering an atmosphere
of respect and tolerance among users.
2. Literature Review
The literature surrounding hate speech and offensive content detection has witnessed substantial
growth due to the increasing concern over the proliferation of harmful online behavior [1, 3].
This literature review aims to explore existing research in the field and shed light on innovative
approaches for addressing this pressing issue. By conducting a comprehensive analysis of prior
studies, we seek to identify new perspectives, reveal gaps in the literature, resolve conflicting
findings, and prevent duplication of effort.
   Several foundational studies have contributed significantly to the development of Natural
Language Processing (NLP) models as central characters in the fight against offensive content
[4, 12]. DistilBERT [13] and XLM-RoBERTa [14] are two such prominent BERT [15] models that
have shown promise in detecting hate speech and offensive language across diverse languages
and contexts. Researchers have devoted considerable effort to honing and refining these
characters, exploring their strengths and limitations in identifying toxic language patterns.
   The setting of this literature review encompasses the digital landscape, where social media
platforms and online communities serve as primary channels for communication. In this
virtual environment, offensive content often thrives, necessitating effective content moderation
strategies [16, 4]. Understanding this setting is crucial for contextualizing the challenges and
opportunities for NLP models in detecting and mitigating hate speech and offensive language
effectively.
   The plot of this literature review revolves around the development and evaluation of NLP
models tailored for hate speech and offensive content detection. Researchers have embarked
on a journey to explore various embedding-classifier pairs, employing techniques such as
DistilBERT with Decision Tree, Gaussian Naive Bayes, Neural Network, and XLM-RoBERTa
embedder with its corresponding classifier [14, 17]. The plot thickens as researchers delve
into the intricacies of these models, seeking to improve performance and generalization across
diverse languages and contexts.
   The overarching theme that unifies the literature is the pursuit of fostering a safer online
environment and promoting linguistic inclusivity. By advancing multilingual toxicity analy-
sis, researchers contribute to creating digital ecosystems where users from diverse linguistic
backgrounds can engage in respectful and tolerant discourse [18, 2]. The theme emphasizes the
significance of breaking barriers, both linguistic and cultural, using advanced NLP techniques
and cross-lingual representation learning to build models that transcend language-specific
limitations.
   This literature review is framed within the context of a broader research landscape that
spans computational linguistics, NLP, and machine learning. It situates itself at the forefront of
addressing the challenges posed by offensive content in Low-Resource Indo-Aryan Languages
like Sinhala and Gujarati [9, 19, 20]. By contextualizing our review within this frame, we
underscore the significance of our research in bridging gaps and offering insights to the broader
academic community.
   The exposition in this literature review provides a comprehensive overview of the current
state of research in hate speech and offensive content detection. We delve into existing studies,
methodologies, and NLP models, outlining the strengths and limitations of various approaches.
Through this exposition, we aim to provide a solid foundation for understanding the challenges
and potential solutions in multilingual toxicity analysis.
  In conclusion, this literature review serves as a critical link between the introduction and
methodology sections of our research. By exploring the foundational elements of NLP models
as characters, the digital setting, the plot of NLP techniques for offensive content detection,
the overarching theme of fostering a safer online environment, and the framing within the
broader research landscape, we gain a deeper understanding of the landscape of multilingual
toxicity analysis. Our analysis aims to identify new avenues for research, resolve conflicts in
existing studies, and contribute to the collective effort of creating a safer and more inclusive
digital ecosystem.


3. Methodology
3.1. Data processing
3.1.1. English
The dataset utilized was from the Kaggle jigsaw-toxic-comment-classification competition. It
had the following toxicity types:

   1. toxic                         3. obscene                        5. insult
   2. severe_toxic                  4. threat                         6. identity_hate

  Since they are all regarded as harmful, if a comment contained at least one of the above labels,
we regarded it as toxic and combined the comments into a new super column. The dataset
approximately had 150k samples of labelled data, after dropping the null values. Moreover, the
data was cleaned by removing unnecessary symbols, tags, etc. For training 80% of the samples
were used while 20% were stored for validation and evaluation.

3.1.2. Hindi
The hindi dataset from [21] was used for hindi fine-tuning. The dataset size had 1603 training
samples in text format with labels. The types of unique labels that were found after being
separated from text included the following:

   1. __label__abusive,                              9. __label__hatespeech__label__abusive,
   2. __label__abusive__label__hatespeech,          10. __label__hatespeech__label__abu-
   3. __label__abusive__label__hate-                    sive__label__abusive,
      speech__label__humor,
                                                    11. __label__hatespeech__label__benign,
   4. __label__abusive__label__humor,
   5. __label__benign,                              12. __label__humor__label__abusive,
   6. __label__benign__label__abusive,              13. __label__humor__label__benign,
   7. __label__benign__label__humor,                14. __label__humor__label__hatespeech__la-
   8. __label__hatespeech,                              bel__abusive
   We labelled the ones containing ’label__abusive’ or ’label__hatespeech’ as ’HOF’ and the rest
as ’NOT’ where ’HOF’ is encoded further as 1, and ’NOT’ as 0. The train set had 1603 samples
while test had 397.

3.1.3. Gujarati
The Gujarati dataset contains 200 tweets. Firstly [22, 23], the tweets were cleaned by removing
unnecessary symbols, tags, and mentions. Moreover, the labels ’HOF’ and ’NOT’ in the dataset
were encoded to 1 and 0 respectively. The same procedure was used for test set which contained
1196 samples.

3.1.4. Bengali, Assamese, Sinhala and Bodo
The datasets of Bengali [24, 23], Assamese [25, 24, 23], Sinhala [26, 23], and Bodo [24, 23] are
primarily collected from Twitter, Facebook, or youtube comments, all were cleaned with the
same process as Gujarati, and had the same label convention.

3.1.5. Findings from datasets
After analyzing each dataset, it was found that each sample had, on average, fewer than 100
tokens. Based on these findings we decided the ’max_length’ of the model. Moreover, for
tokenization ’word_tokenize’ from ’nltk’ was used to tokenize the samples, and average tokens
(avg_tokens) were calculated using the average formula that is:
                                                  𝑛        𝑚
                                                ∑𝑖=1 ∑𝑤=1 𝑇𝑤
                                 𝑎𝑣𝑔_𝑡𝑜𝑘𝑒𝑛𝑠 =
                                                       𝑙
                                                      ∑𝑘=1 𝑘

where:

   1. n: total number of samples                      3. m: total words in a sample
   2. T𝑤 : token of a word                            4. l: total number of samples

3.2. Model and architecture
We employ the XLM-RoBERTa model, a powerful pre-trained language representation model, for
the task of hate speech and offensive content detection. XLM-RoBERTa, short for Cross-lingual
Language Model - RoBERTa, is an extension of the RoBERTa model that is specifically designed
to handle multilingual text, and performs particularly well on resource languages[14].
   The architecture of XLM-RoBERTa consists of an embeddings layer, followed by an encoder
layer. The embeddings layer includes word embeddings, position embeddings, and token type
embeddings, which together form the input representation for the model. The encoder layer
comprises a series of XLM-RoBERTa layers, each consisting of attention mechanisms and
feed-forward neural networks [14].
   The attention mechanism in XLM-RoBERTa enables the model to focus on important parts
of the input sequence during processing. This mechanism includes self-attention, where each
word in the input is associated with weights that determine its relevance to other words in the
sequence [27].
   Furthermore, XLM-RoBERTa incorporates LayerNorm and dropout layers to improve model
stability and prevent overfitting. LayerNorm normalizes the hidden state of each layer, and
dropout randomly deactivates certain neurons during training to prevent the model from relying
too heavily on specific features [28, 29].
   The classification head of XLM-RoBERTa consists of a dense layer, followed by a dropout layer,
and an output projection layer. The dense layer helps in reducing the feature dimensionality,
while dropout aids in regularization to avoid overfitting. The output projection layer maps the
reduced features to the final output, which is a binary classification for hate speech and offensive
content detection [14]. The model size is 278M parameters. However, trainable classification
head has 592,130 parameters. Additionally, the maximum length of the model was set at 256
for Code-Mixed Languages and 128 for the rest based on the results of the dataset processing
because the average token count was less than 100. The figure 1 below illustrates sample model
pipeline. The pipeline consist of an multi-lingual embedder followed by a classifier:




                                    Figure 1: Model pipeline

   In summary, the XLM-RoBERTa model, with its multilingual representation learning capa-
bilities and attention-based architecture, is a robust choice for our hate speech and offensive
content detection task. By leveraging its pre-trained language understanding, we aim to achieve
accurate and effective results across diverse languages and contexts.

3.3. Model Implementation
We propose a model (the implementation can be viewed at https://github.com/Jayveersinh-Raj/
indo_aryan_clf_HASOC_2023) which is initially trained on an English dataset and subsequently
fine-tuned on different languages. By following the above approach the model can be tailored
to any low resource language with relatively small dataset. The naming conventions for the
models are as follows:

    • XLM-Classifier: The radomly initialized classifier not finetuned for downstream task.
    • PolyGuard: The English only trained model (XLM-Classifier) [30]
    • Indo-Aryan-Extension: The PolyGuard finetuned on Hindi [31]
    • xlm-sinhala: The Sinhala only trained model (XLM-Classifier)
    • xlm-bengali: The Bengali only trained model (XLM-Classifier)
    • xlm-bodo: The Bodo only trained model (XLM-Classifier)
    • xlm-assamese: The Assamese only trained model (XLM-Classifier)
It is important to note that in all of the models above, and if fine-tuned, we utilized the AdamW
optimizer which is an extension of the Adam optimizer using a method known as weight decay.
To stop overfitting, a regularization factor called weight decay is applied to the loss function.
By separating weight decay from the optimization processes, AdamW solves the weight decay
problems with the original Adam optimizer, providing more reliable and efficient training. The
common hyperparameters used are as follows:
    • learning rate: 2𝑥10−5
    • epsilon: 1𝑥10−8
    • Trainable Layers: All
These hyper-parameters were chosen to balance precision in weight updates and numerical
stability during the optimization process, ensuring the model’s effective training and perfor-
mance. Another important hyperparameter we used was epoch, which is mentioned in each of
the sub-experiments in the results section. We tested different epochs on each sub-dataset to
find the best performance, and we chose the epochs that performed well on the unseen test
data, which helped to mitigate overfitting and underfitting.


4. Results
In this section, we present the results of our experiments in fine-tuning the XLM-RoBERTa
model for hate speech and offensive content detection in various Indian languages. The experi-
ments considers fixing the model while testing the performance on various datasets, and few
shot capabilities of the model. The primary focus for few shot inference on languages such
as Gujarati which has relatively low size leaving no room to train a raw architecture on the
dataset.

   We report the dataset details and performance metrics for each language below, however, it
is imperative to mention that in order to avoid over-fitting we verified the outcomes by training
the models for a slightly shorter number of epochs than stated below for each language.

4.1. English
Dataset Details:
    • Training Dataset: Approximately 150k samples encoded as 1 for Hatespeech, and 0 for
      neutral
    • Total Training Samples: Approximately 120k
    • Test Dataset: Approximately 30k.
                              Model performances
    Model                                                   epochs F1-score
    XLM-Classifier                                          3      0.960
4.2. Hindi
Dataset Details:

   • Training Dataset: Cleaned with sentences split from labels. ’Hatespeech’ labeled as ’HOF’
     (1), the rest as ’NOT’ (0).
   • Total Training Samples: 1603
   • Test Dataset: Similar preprocessing applied, containing 397 samples.

                           Model performances
   Model                                                 epochs F1-score
   Direct inference (PolyGuard)                          -      0.458
   Fine-tuned (PolyGuard)                                10     0.889

4.3. Gujarati
Dataset Details:

   • Training Dataset: Contains 200 crawled tweets in Gujarati labeled ’HOF’ (1) for hate
     speech and ’NOT’ (0) for neutral.
   • Unique local slangs observed in tweets, cleaned by removing mentions and symbols.
   • Test set contained 1196 tweets
                                    Model performances
   Model                             epochs F1-score   Leaderboard Rank
   Direct inference (PolyGuard)      -       0.165     -
   Direct inference (Indo-Aryan-     -       0.547     -
   Extension)
   Fine-tuned (PolyGuard)             20       0.791        -
   Fine-tuned        (Indo-Aryan-     20       0.793        4
   Extension)

4.4. Sinhala
Dataset Details:

   • Training Dataset: Contains 7,500 samples in Sinhala labeled ’HOF’ (1) for hate speech
     and ’NOT’ (0) for neutral.
   • Test Samples: 2,500.

                                 Model performances
   Model                          epochs F1-score   Leaderboard Rank
   xlm-sinhala                    10      0.835     3
   Fine-tuned        (Indo-Aryan- 10      0.827     -
   Extension)
   Fine-tuned (PolyGuard)             10       0.820        -
4.5. Bengali
Dataset Details:

   • Training Dataset: Contains 1281 samples in Bengali labeled ’HOF’ (1) for hate speech and
     ’NOT’ (0) for neutral.
   • Test samples: 320

                                   Model performances
   Model                            epochs F1-score   Leaderboard Rank
   Fine-tuned(Indo-Aryan-           30      0.726     6
   Extension)
   Fine-tuned (PolyGuard)             30       0.726       6
   xlm-bengali                        30       0.634       -

4.6. Bodo
Dataset Details:

   • Training Dataset: Contains 1679 samples in Bodo labeled ’HOF’ (1) for hate speech and
     ’NOT’ (0) for neutral.
   • Test samples: 420

                                   Model performances
   Model                            epochs F1-score   Leaderboard Rank
   Fine-tuned(Indo-Aryan-           20      0.829     -
   Extension)
   Fine-tuned (PolyGuard)             20       0.844       6
   xlm-Bodo                           20       0.831       -

4.7. Assamese
Dataset Details:

   • Training Dataset: Contains 4036 samples in Assamese labeled ’HOF’ (1) for hate speech
     and ’NOT’ (0) for neutral.
   • Test samples: 1009

                                Model performances
   Model                         epochs F1-score   Leaderboard Rank
   Fine-tuned       (Indo-Aryan- 30      0.707     5
   Extension)
   Fine-tuned (PolyGuard)             30       0.706       -
   xlm-assamese                       30       0.681       -
4.8. General Analysis
    • The fine-tuning process demonstrates the model’s adaptability to different languages
      with significantly low fine tuning samples of the target language, with F1 scores ranging
      from 0.70 to 0.89.
    • Notably, the model performs exceptionally well in languages like Sinhala and Bodo, where
      F1 scores exceed 0.8.
    • The performance in Assamese and Bengali datasets is moderate, suggesting room for
      further improvement.

   These experiments demonstrate the XLM-RoBERTa model’s potential for cross-lingual hate
speech detection, as well as the potential of our ’PolyGuard’ model, which, after being trained
on 100k+ samples of English, can produce significant results after being fine-tuned with a very
small dataset of a target language. The results vary across languages, emphasizing the need for
language-specific adaptations and larger, diverse datasets to enhance model performance further.
Future work should focus on addressing the challenges posed by low resource languages and
exploring techniques to improve classification accuracy.
The following figure demonstrates that fine tuned ’PolyGuard’ yields better results over multiple
languages:




Moreover, the results of the fine-tuned ’PolyGuard’ model on varying dataset size can be
seen in the figure below:
5. Discussion
In this section, we delve into the insights drawn from the results obtained in our experiments.
The varying performance across languages and datasets provides valuable insights into the
challenges and opportunities in multilingual hate speech detection.

5.1. Cross-Lingual Adaptability
Our experiments have showcased the remarkable cross-lingual adaptability of the XLM-RoBERTa
model. It excelled in languages like Sinhala and Bodo, where the F1 scores exceeded 0.8. This
demonstrates the model’s potential for effectively identifying hate speech in languages with
relatively larger datasets and well-defined linguistic patterns. It aligns with prior research
indicating that multilingual models can leverage their pre-trained knowledge effectively across
languages [32].

5.2. Data Quality and Size
The performance in Gujarati is notable, with an F1 score of 0.7926. It’s essential to highlight
that this dataset contained local slangs and expressions that might only be discernible to native
speakers. This raises questions about data quality and the importance of domain-specific
knowledge in hate speech detection tasks. Additionally, the dataset sizes played a crucial role in
model performance. Languages with larger datasets, such as Sinhala and Bodo, yielded higher
F1 scores. Hence, expanding and diversifying datasets for low-resource languages is essential
for improved model performance.
5.3. Room for Improvement
While our model achieved competitive results, there is room for improvement. For instance, in
Bengali and Assamese datasets, where F1 scores were moderate (0.726 and 0.707 respectively),
further fine-tuning, data augmentation, or specialized techniques for handling local dialects may
enhance performance. Additionally, exploring more advanced model compression techniques
could optimize efficiency while retaining performance [33].

5.4. Future Directions
Future work in cross-lingual hate speech detection should focus on:
   1. Investigating techniques for better handling dialects.
   2. Expanding and diversifying datasets for low-resource languages.
   3. Exploring advanced fine-tuning strategies to boost model performance.
   4. Addressing issues related to model efficiency and interpretability.
   Our experiments underscore the adaptability and potential of the XLM-RoBERTa model in
multilingual hate speech detection. However, they also shed light on the complexities and the
critical role of dataset size and quality. These findings provide a foundation for further research
aimed at bridging the gap in hate speech detection across diverse languages and cultures.


6. Conclusion
In this study, we embarked on a journey to break barriers in multilingual toxicity analysis,
focusing on hate speech and offensive content detection in low-resource languages. Leveraging
the formidable capabilities of the XLM-RoBERTa model, we conducted a series of experiments
fine-tuning the model for various languages. The insights drawn from our endeavors provide
valuable contributions to the field of cross-lingual hate speech detection.
   Our results are promising, indicating the remarkable cross-lingual adaptability of the XLM-
RoBERTa model. It demonstrated exceptional performance in languages such as Sinhala and
Bodo, achieving F1 scores exceeding 0.8. These outcomes underscore the potential of multi-
lingual models to effectively identify hate speech in languages with relatively larger datasets
and well-defined linguistic patterns. This aligns with the vision of creating models that can
transcend language barriers.
   Our experiments also emphasized the critical importance of data quality and dataset size. In
Gujarati, the model achieved commendable results (F1 score of 0.7926), but the dataset contained
local slangs and expressions that might be understood only by native speakers. This raises
questions about the significance of domain-specific knowledge in hate speech detection tasks.
Additionally, languages with larger datasets, like Sinhala and Bodo, yielded higher F1 scores,
highlighting the importance of dataset expansion and diversification for low-resource languages.
   Despite these challenges, our model has exhibited its potential to deliver competitive per-
formance in multilingual hate speech detection. With access to balanced datasets and further
fine-tuning efforts, we believe our model can surpass competitors in the field, making significant
strides in mitigating hate speech and offensive content in diverse linguistic landscapes. This
study has also demonstrated the generalizability of our model to different datasets, paving the
way for its broader application.
   In conclusion, our journey in breaking barriers through multilingual toxicity analysis has
yielded promising results. The XLM-RoBERTa model’s adaptability, coupled with the insights
gained from our experiments, sets the stage for future research and advancements in hate
speech detection across languages and cultures. As we continue to refine and expand our model,
we are one step closer to fostering more inclusive and respectful online spaces for all.


References
 [1] X. Zhang, F. Luo, X. Hu, Detecting hate speech on twitter using a convolution-gru based
     deep neural network, Proceedings of the 27th international conference on computational
     linguistics (2018) 2561–2571.
 [2] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the
     problem of offensive language, Proceedings of the 11th international conference on web
     and social media (2017) 512–515.
 [3] Z. Waseem, D. Hovy, Hateful symbols or hateful people? predictive features for hate
     speech detection on twitter, NAACL-HLT (2016) 88–93.
 [4] A. Schmidt, M. Wiegand, A survey of hate speech detection using natural language
     processing, Proceedings of the fifth international workshop on natural language processing
     for social media (2017) 1–10.
 [5] C. J. Hutto, E. Gilbert, Vader: A parsimonious rule-based model for sentiment analysis
     of social media text, Eighth international conference on weblogs and social media (2014)
     216–225.
 [6] A. Go, R. Bhayani, L. Huang, Twitter sentiment classification using distant supervision,
     CS224N Project Report, Stanford 1 (2009) 2009.
 [7] Z. Waseem, D. Hovy, Understanding abuse: A typology of abusive language detection
     subtasks, Proceedings of the first workshop on abusive language online (2017) 1–10.
 [8] K. Ghosh, D. A. Senapati, Hate speech detection: a comparison of mono and multilingual
     transformer model with cross-language evaluation, in: Proceedings of the 36th Pacific Asia
     Conference on Language, Information and Computation, De La Salle University, Manila,
     Philippines, 2022, pp. 853–865. URL: https://aclanthology.org/2022.paclic-1.94.
 [9] M. Geva, Y. Goldberg, Contextualizing hate speech classifiers with post-hoc explanations,
     Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing
     and the 9th International Joint Conference on Natural Language Processing (EMNLP-
     IJCNLP) (2019) 2736–2741.
[10] D. Nkemelu, H. Shah, I. Essa, M. L. Best, Tackling hate speech in low-resource languages
     with context experts, 2023. arXiv:2303.16828 .
[11] T. Schuster, J. Bastings, Y. Belinkov, S. Goldwater, Cross-lingual alignment of contextual
     word embeddings, with applications to zero-shot dependency parsing, in: Proceedings
     of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp.
     3500–3508.
[12] W. Liu, S. Singh, M. Gardner, P. Kohli, Deep learning for extreme multi-label text classifi-
     cation, arXiv preprint arXiv:1905.07342 (2019).
[13] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller,
     faster, cheaper and lighter, 2020. arXiv:1910.01108 .
[14] A. Conneau, G. Lample, Unsupervised cross-lingual representation learning at scale, in:
     Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,
     2019, pp. 8440–8451.
[15] K. Ghosh, A. Senapati, U. Garain, Baseline bert models for conversational hate speech detec-
     tion in code-mixed tweets utilizing data augmentation and offensive language identification
     in marathi, in: Fire, 2022. URL: https://api.semanticscholar.org/CorpusID:259123570.
[16] I. Kwok, Y. Wang, Locate the hate: Detecting tweets against blacks, Proceedings of the
     51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long
     Papers) (2013) 1250–1259.
[17] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, 2019. arXiv:1810.04805 .
[18] W. Yin, A. Zubiaga, Towards generalisable hate speech detection: a review on obstacles
     and solutions, 2021. arXiv:2102.08886 .
[19] A. Vadesara, P. Tanna, Corpus building for hate speech detection of gujarati language, in:
     International Conference on Soft Computing and its Engineering Applications, Springer,
     2022, pp. 382–395.
[20] H. Sandaruwan, S. Lorensuhewa, M. Kalyani, Sinhala hate speech detection in social
     media using text mining and machine learning, in: 2019 19th International Conference on
     Advances in ICT for Emerging Regions (ICTer), volume 250, IEEE, 2019, pp. 1–8.
[21] V. K. Jha, H. P, V. P. N, V. Vijayan, P. P, Dhot-repository and classification of offensive
     tweets in the hindi language, Procedia Computer Science 171 (2020) 2324–2333. doi:https:
     //doi.org/10.1016/j.procs.2020.04.252 .
[22] S. Satapara, H. Madhu, T. Ranasinghe, A. E. Dmonte, M. Zampieri, P. Pandya, N. Shah,
     M. Sandip, P. Majumder, T. Mandl, Overview of the hasoc subtrack at fire 2023: Hate-
     speech identification in sinhala and gujarati, in: K. Ghosh, T. Mandl, P. Majumder, M. Mitra
     (Eds.), Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, Goa,
     India. December 15-18, 2023, CEUR Workshop Proceedings, CEUR-WS.org, 2023.
[23] T. Ranasinghe, K. Ghosh, A. S. Pal, A. Senapati, A. E. Dmonte, M. Zampieri, S. Modha,
     S. Satapara, Overview of the HASOC subtracks at FIRE 2023: Hate speech and offensive
     content identification in assamese, bengali, bodo, gujarati and sinhala, in: Proceedings of
     the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2023,
     Goa, India. December 15-18, 2023, ACM, 2023.
[24] K. Ghosh, A. Senapati, A. S. Pal, Annihilate Hates (Task 4, HASOC 2023): Hate Speech
     Detection in Assamese, Bengali, and Bodo languages, in: Working Notes of FIRE 2023 -
     Forum for Information Retrieval Evaluation, CEUR, 2023.
[25] K. Ghosh, D. Sonowal, A. Basumatary, B. Gogoi, A. Senapati, Transformer-based hate
     speech detection in assamese, in: 2023 IEEE Guwahati Subsection Conference (GCON),
     2023, pp. 1–5. doi:10.1109/GCON58516.2023.10183497 .
[26] T. Ranasinghe, I. Anuradha, D. Premasiri, K. Silva, H. Hettiarachchi, L. Uyangodage,
     M. Zampieri, Sold: Sinhala offensive language dataset, 2022. arXiv:2212.00851 .
[27] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo-
     sukhin, Attention is all you need, Advances in Neural Information Processing Systems 30
     (2017) 5998–6008.
[28] J. L. Ba, J. R. Kiros, G. E. Hinton, Layer normalization, arXiv preprint arXiv:1607.06450
     (2016).
[29] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple
     way to prevent neural networks from overfitting, Journal of Machine Learning Research
     15 (2014) 1929–1958.
[30] J. Raj, Polyguard model, 2023. URL: https://huggingface.co/Jayveersinh-Raj/PolyGuard.
[31] J. Raj, Indo-aryan-abuse-detection, 2023. URL: https://huggingface.co/Jayveersinh-Raj/
     Indo-Aryan-abuse-detection.
[32] Y. Tang, C. Tran, X. Li, P.-J. Chen, N. Goyal, V. Chaudhary, J. Gu, A. Fan, Multilingual trans-
     lation with extensible multilingual pretraining and finetuning, 2020. arXiv:2008.00401 .
[33] A. Berthelier, T. Chateau, S. Duffner, C. Garcia, C. Blanc, Deep model compression and
     architecture optimization for embedded systems: A survey, Journal of Signal Processing
     Systems 93 (2021) 863–878.