Annihilate Hates (Task 4, HASOC 2023): Hate Speech
                                Detection in Assamese, Bengali, and Bodo languages
                                Koyel Ghosh1 , Apurbalal Senapati1 and Aditya Shankar Pal2
                                1
                                    Central Institute of Technology, Kokrajhar, Assam, India
                                2
                                    Indian Statistical Institute, Kolkata, India


                                                                         Abstract
                                                                         In today’s world, social media can act as a tool for spreading hate towards a person or group based on
                                                                         their color, caste, sex, sexual orientation, political differences, etc. As social media continues to expand,
                                                                         the proliferation of hate speech is also surging at an alarming rate. Recently, Research on identifying hate
                                                                         speech in social media has gained significant prominence, with a specific need for investigations focused
                                                                         on languages other than English. The HASOC (Hate Speech and Offensive Content Identification) track
                                                                         intends to provide a platform for Hate Speech Detection since 2019 at FIRE (Forum for Information
                                                                         Retrieval Evaluation). HASOC 2023 is coordinating four tasks, with AH (Annihilate Hates, Task 4) being
                                                                         one of them. The AH task aims to develop and assess supervised machine learning systems on the three
                                                                         datasets. The three datasets presented for hate speech in three Indian languages (Assamese, Bengali, and
                                                                         Bodo) are collected from ™YouTube and ™Facebook comments. Each dataset is tagged with the binary
                                                                         classification (hate or non-hate) labels. In the Assamese language, 20 teams made 180 submissions, while
                                                                         21 teams submitted 214 entries in the Bengali language, and for the Bodo language, 19 teams submitted a
                                                                         total of 175 submissions. The performance of the best classifiers for Assamese, Bengali, and Bodo are
                                                                         measured with the Macro F1 score of 0.73, 0.77, and 0.85, respectively. This article briefly summarizes the
                                                                         tasks, data development, and results. The variant of BERT architecture achieved the best performance in
                                                                         the task. However, other systems have also been successfully applied to the task.

                                                                         Keywords
                                                                         Hate Speech Detection, Binary Classification, Assamese, Bengali, Bodo, Machine Learning, Deep Learning,
                                                                         Transformers, BERT


                                1. Introduction
                                In addition to fostering friendships and facilitating information sharing, popular social me-
                                dia platforms such as ™Twitter, ™Facebook, and ™YouTube have also become platforms for
                                cyberbullying and online harassment. These negative aspects can have severe consequences,
                                including causing depression and inciting individuals to engage in violent actions, as evidenced
                                in studies like [1, 2]. Instances of hate speech on these platforms have disrupted social and
                                communal harmony on a global scale. Consequently, many countries have introduced increas-
                                ingly complex regulations to address offensive online content, as discussed in [3] and [4]. This

                                Forum for Information Retrieval Evaluation, December 15-18, 2023, Goa, India
                                Envelope-Open ghosh.koyel8@gmail.com (K. Ghosh); a.senapati@cit.ac.in (A. Senapati); adityashankarpal_r@isical.ac.in
                                (A. S. Pal)
                                GLOBE https://github.com/BrainLearns (K. Ghosh)
                                Orcid 0000-0001-5347-4961 (K. Ghosh); 0000-0001-9124-2563 (A. Senapati); 0009-0001-0345-5428 (A. S. Pal)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
situation has created a crucial need for automated methods to detect suspicious posts. It’s worth
noting that most research in this area has primarily focused on English and similar languages.
On the other hand, Low-resource languages need more annotated datasets. Linguists have
examined and characterized different manifestations of hate speech [5], while political scholars
and legal authorities explore methods to govern online platforms and address problematic
content while preserving the principles of free expression [6]. Algorithms are always getting
better, and people are making lots of different sets of data for lots of other things, and they’re
studying them. Recently, Researchers made sets of data for many different languages [7] like
English [8, 9, 10, 11], Greek [12], Portuguese [13], Danish [14], Mexican Spanish [15], and Turk-
ish [16]. In Indian languages, hate speech dataset available is Hindi [17, 18, 19], Marathi [19],
Bengali [20], Telegu [21], Tamil [21], Malayalam [21], and Kannada [21]. Having all these
different data sets helps us understand how similar or different they are and how trustworthy
they are.
   In the HASOC 2023, four tasks (Task 1 - Task 4) in the research area of Hate Speech detection
are proposed. Task 1 [22] focuses on identifying hate speech, offensive language, and profanity
in different languages using natural language processing techniques. Task 2 [23], known as the
Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL), addresses
the challenge of identifying hate speech and offensive content in code-mixed conversations on
social media. Code-mixed text includes multiple languages within a single conversation. The
task is divided into two subtasks. Task 3 [24] aims to detect the various hateful spans within a
sentence already considered hateful. A hate span is a set of continuous tokens that, in tandem,
communicate the explicit hatefulness in a sentence.
   This paper will provide an overview of Task 4, i.e., Annihilate Hates (AH), which contributes
task-specific (Hate speech detection) low-resourced datasets on three languages: Assamese,
Bengali, and Bodo. This AH dataset is version 3 and well updated of HS (version 2) [25], and
NEIHS (version 1) [26, 27] datasets.


2. Related Forum and Dataset
The main obstacle in hate speech detection is the requirement for language-specific datasets.
Constructing labeled datasets for hate speech in Indian languages is a laborious and intricate
endeavor. It needs extensive groundwork and preprocessing tasks such as data cleaning and
ensuring agreement among annotators. This section provides a concise overview of Indian
datasets in languages like Hindi, Marathi, Bengali, Telugu, Tamil, Malayalam, and Kannada.
   The HASOC challenge, organized by FIRE (Forum for Information Retrieval Evaluation)1 ,
has played a significant role in providing hate speech datasets in Indian languages like Hindi,
Marathi, etc. HASOC comprises four subtracks, and the dataset distribution is in a tab-separated
format. In 2019, the HASOC-Hindi dataset introduced three tasks as described in [17]. Subtask A
is the initial task involving binary classification. Subtask B focuses on identifying the profanity
or abuse within hate comments, a multiclass classification task. Subtask C is centered on
determining whether the hate speech is targeted at a specific individual or if it’s more general
and untargeted. In the HASOC 2020 edition, two hate speech detection tasks were presented, as
1
    http://fire.irsi.res.in/fire/2022/home (Access on 30.10.2023)
mentioned in [18]. Subtask A involves binary classification, and Subtask B addresses multiclass
classification. These tasks are accompanied by another Hindi dataset, expanding the research
scope in this area. In 2021, HASOC published a Hindi dataset [19] with sub-tasks A and B
again. Total Sixty-five teams submitted a total of six thousand and fifty-two runs. In HASOC-
Marathi [19], the Marathi hate speech dataset with binary classification task. Authors [28, 29]
experimented on HASOC datasets and analyzed the transformer-based model’s performance in
detail. BD-SHS [20] is a Bengali hate speech dataset with three levels: hate speech identification
(binary classification, i.e., hate and not hate), identification of the Target of hate speech (multi-
label classification, i.e., individual, male, female, and group), and categorization of hate speech
types (multi-label classification, i.e., slander, call to violence, gender, religion). The authors [21]
created several Indian datasets, i.e., Hindi, Telegu, Tamil, Malayalam, and Kannada, later
performed monolingual, unbalanced splits, zero-shot cross-lingual, Few-shot, Joint training,
pretraining and cross-dataset experiments on those datasets.


3. Task Description
In HASOC 20232 , Task 4 is Annihilate Hates (AH) with three languages proposed in the research
area of hate speech detection. These tasks offered all three languages: Assamese, Bengali, and
Bodo. Figure 1 shows the Screenshot of Annihilate Hates (AH) Website3 .

3.1. Sub-task A: Hate Speech Detection in Assamese, Bengali, and Bodo
     (Binary)
Task 4 aims to detect hate speech in Assamese, Bengali, and Bodo languages. Each dataset
(for the three languages) consists of a list of comments with their corresponding class (hate or
offensive (HOF ) or not hate (NOT )). Data is primarily collected from ™Facebook and ™YouTube
comments. It is a binary classification task in which participating systems are required to
classify the comments into two classes: HOF and NOT. Figure 2 shows the sample-tagged
datasets of the AH-Assamese, AH-Bengali, and AH-Bodo.


4. Dataset Description
In this section, dataset collection, annotation, and analysis have been discussed for Task 4.

4.1. Dataset Collection
Our primary aim in constructing this dataset is to ensure its diversity, so we intentionally
selected a few political, entertainment, and more ™Facebook pages and ™YouTube channels. We
initiated the process by identifying contentious posts, often related to recent events, prominent
figures such as politicians and actors, which had a higher likelihood of containing hate speech.
Subsequently, we scrutinized the comments on these posts, seeking those primarily written in a

2
    https://hasocfire.github.io/hasoc/2023/index.html (Access on 30.10.2023)
3
    https://sites.google.com/view/hasoc-2023-annihilate-hates/home (Access on 30.10.2023)
Figure 1: Screenshot of Annihilate Hates (AH) Website

                               text                        task_1                           text                      task_1                           text                            task_1

                                                                                                                               आंग सानो जाय बाराद्राय बखियो बे जेबो मावनो रोङा
      চাল্লা কি মানুহ?                                              কেন বড় বড় কথা কোথায় গেল?
                                                           NOT                                                        NOT      Translation - (Those who constantly abuse others will   NOT
      Translation - (What kind of man is he?)                       Translation - (Where did the big talk go?)
                                                                                                                               achieve nothing.)

                                                                                                                               AASU नि आसाम Accord नि खोथाखौ सोरबा बरफोरा
      99% গেদাই অসমত চোৰ ধৰ্ষন হত্যা কৰে ..।                                                                                   मोनथिगौब्ला गोसो खांबानो जाबाय
                                                                    তোর বাবার চাকরি শালা
      Translation - (99% of Rapes and Robberies are        HOF                                                        HOF      Translation - (Those Bodos who are familiar with the    HOF
                                                                    Translation - (Your father's job, shala)
      done by Gedas...)                                                                                                        AASU and the Assam Accord are requested to keep
                                                                                                                               this in mind)

      মুঠ আপ সমৰ্থক--৮০% মিঞা (তাৰে ৫০% ফেক
      নামত কমেণ্ট দিয়ে বাকী ৩০% নিজ নামত) গ'ল
      ৮০% মিঞা। বাকী থাকিল ২০% বদন (এইকেইটা হ'ল
                                                                    চাকরিটা তো ওর মা বাপ চোদানো চাকরি তাই যাকে                 सोरबा माबा मोनसे खामानि मावनो थांनायाव मानि हेंथा
      কংগ্ৰেছ অখিল সমৰ্থক)"
                                                                    মোন হবে তাকে দেবে                                          गिखफोर ?
      Translation - (Total AAP supporters-- 80% Miya       HOF                                                        HOF                                                              NOT
                                                                    Translation - (The job is their mom and dad                Translation - (Why was there always a barrier when
      (50% of them are fake accounts, and the other
                                                                    fucking job, so they will give it to whoever he            they were going to work for the good ?)
      30% is their real accounts), after 80% Miya's. The
                                                                    likes)
      remaining ones are Badan (These are the
      supporters of Congress and Akhil)

                                                                    ব্যাপারটা কেউ একটু বুঝিয়ে বলবেন?                          हनै मालाय हारसा हिनजावनि खिबु सुग्राफोरा मिनिसोदों
      দেখাত জেহাদি জেহাদি লাগে
                                                           HOF      Translation - (Can someone explain the            NOT      Translation - (Look, Assamese women bum cleaners        HOF
      Translation - (You look like a Jihadi)
                                                                    matter?)                                                   are laughing.)
                                      (a)                                                          (b)                                                        (c)


Figure 2: Samples tagged dataset of (a) AH-Assamese, (b) AH-Bengali, and (c) AH-Bodo


monolingual format, typically comprising 80-90%. We then conducted a manual assessment to
determine whether these comments contained hate speech and categorized them accordingly.
All the comments are collected using open source scrapper tools4 . Ultimately, native speakers
tagged the sentences as either HOF or NOT. Sentences that fell under the HOF category usually
contained hate-related words and were considered hate-offensive statements. In contrast,
sentences conveying formal information, suggestions, or questions were categorized as NOT
sentences.

4
    https://github.com/kevinzg/facebook-scraper (Access on 30.10.2023)
4.2. Dataset Annotation
In dataset annotation, we share three separate CSV files with the three annotators. These files
contain three columns: S. No. (serial number), text (comments), and task_1 (binary, i.e., HOF or
NOT ). The data for each language was tagged manually by three native speakers, young adults
between 19 and 24. These annotators are students at the Central Institute of Technology in
Kokrajhar, Assam, India. Their task involved manually classifying comments into two categories:
those containing hateful (HOF ) content and those that did not (NOT ), using binary labels. The
final decision was taken by consulting with a domain expert. Identifying hate speech is a
subjective task, and it requires careful consideration. Consequently, we have established specific
and rigorous guidelines to help define what qualifies as hate speech. These regulations are
based on the community standards of ™Facebook5 and ™YouTube6 . The authors [21] follow
the below-mentioned rules for the comments to be marked as hate, and we follow the scheme
with updation. (a) Profanity: Comments that include profane language, curses, or vulgar words
are categorized as hate speech. (b) Sexual orientation: Sexual attraction can be directed toward
individuals of the opposite gender, the same gender, both genders or multiple genders. (c)
Personal: Comments regarding one’s fashion sense, choice of content, language selection, and
related aspects. (d) Gender chauvinism: People are targeted in the comment because of their
gender. (e) Religious: A person is criticized for their choice of religious beliefs and practices. For
example, comments challenging the use of a turban or a burkha (the veil), (f) Political: Harassed
a person based on political beliefs. For instance, bullying people for supporting a political party.
(g) Violent intention: Containing a threat or call to violence in the comments.
   Different annotators annotate the AH datasets, and the majority vote was considered; the
annotation agreement calculated using 𝜅 (Kappa) coefficient is shown in Table 1. The problems
and the level of disagreement need to be explored in the future.

                                          Datasets         𝜅 statistics
                                        AH-Assamese            0.67
                                         AH-Bengali            0.54
                                          AH-Bodo              0.81
Table 1
𝜅 statistics for all three datasets (Sub-task A)


4.3. Dataset Analysis
We summarize the key statistics of the AH dataset in Table 2. For the Assamese dataset, 2,955
comments are HOF out of 5,045. 641 comments are HOF out of 1,601 in the Bengali dataset
which leads NOT is high. Out of 2,099, a total of 1,225 are HOF in the Bodo dataset. As a result,
our Assamese and Bodo dataset is slightly skewed in favour of containing hate speech. Figure 3
shows the details of class distribution. In the training dataset, 4,036, 1,281, and 1,679 comments
are present in the Assamese, Bengali, and Bodo datasets, respectively.

5
    https://web.facebook.com/communitystandards/ (Access on 30.10.2023)
6
    https://www.youtube.com/howyoutubeworks/policies/community-guidelines/ (Access on 30.10.2023)
                                               HOF               NOT
                        Dataset                                                Total
                                           Train Test        Train Test
                        AH-Assamese        2,347   608       1,689   401        5,045
                        AH-Bengali          515    126        766    194        1,601
                        AH-Bodo             998    227        681    193        2,099
Table 2
Class-wise distribution for AH-Assamese, AH-Bengali, and AH-Bodo datasets.


5. Result
The macro approach computes the F1 score individually for each class without considering
the use of weights for aggregation. As a result, it imposes a more significant penalty when
a system’s performance is poor for minority classes. The selection of a specific F1 variant
depends on the task’s objectives and the label distribution in the dataset. Hate speech-related
classification tasks often face class imbalance, making the macro F1 measure the suitable choice
for evaluation.
   For the system rum submission and evaluation of participants’ experiments, we depend on the
Kaggle platform. Figure 4 shows the Screenshot of the Annihilate Hates (AH) Kaggle Website
for run submission. We provide separate Kaggle platforms Like Assamese7 , Bengali8 , Bodo9 for
participants to submit experimental runs.
   Overall, 69 participants register for the task 4. In the Assamese task, 20 teams made 180
submissions, while 21 teams submitted 214 runs in the Bengali task, and for the Bodo task, 19
teams submitted a total of 175 runs.
   The performance of the best classification algorithms for Assamese, Bengali, and Bodo are
Macro F1 measures of 0.73, 0.77, and 0.85, respectively. The results for AH-Assamese, AH-
Bengali, and AH-Bodo datasets are shown in Table 3, Table 4, and Table 5, respectively.


6. Methodology
This section discusses the systems utilized by the participants.

6.1. AH-Assamese
    • Chetona [30] propose ensembling IndicBERT and Naive Bayes, along with synthetic data
      upsampling techniques (up-sample the training examples of each language by translating
      the examples from the other two languages to the given language.).
    • FiRC-NLP [31] fine-tune the pre-trained XLM-RoBERTa-large model to get second position
      in the leaderboard.
    • TeamBD [32] experimented with xlm-roberta-large (multilingual) along with ChatGPT3
      augmentation.

7
  https://www.kaggle.com/competitions/annihilate-hates-assamese (Access on 30.10.2023)
8
  https://www.kaggle.com/competitions/annihilate-hates-bengali (Access on 30.10.2023)
9
  https://www.kaggle.com/competitions/annihilate-hates-bodo (Access on 30.10.2023)
                                  (a)                                                (b)


                                  (c)                                          (d)


                                  (e)                                          (f)


Figure 3: Class distribution of AH dataset with two classes (HOF and NOT ) - a) AH-Assamese (train) and
b) AH-Assamese (test), c) AH-Bengali (train), d) AH-Bengali (test), e) AH-Bodo (train) and f) AH-Bodo
(test)


        • SATLab [33] uses the LIBLinear L2-regularized logistic regression model (dual, -s 7) [42].
        • AI Alchemists [34] fune-tuned XLM-RoBERTa model.
        • Sanvadita [35] uses monolingual assamese-bert10 and the multilingual indic-bert model11 .
        • Z-AGI Labs [36] experiments with various multi-lingual transformer-based models
          for fine-tuning such as Bert-Base-Multilingual (Cased and Uncased), DistilBert-Base-
10
     https://huggingface.co/l3cube-pune/assamese-bert (Access on 30.10.2023)
11
     https://huggingface.co/ai4bharat/indic-bert (Access on 30.10.2023)
Figure 4: Screenshot of the Annihilate Hates (AH) Kaggle Website for run submission.


                        Rank     Team name                  Marco F1 score
                        1        Chetona [30]               0.7346
                        2        FiRC-NLP [31]              0.7251
                        3        TeamBD [32]                0.7222
                        4        SATLab [33]                0.7151
                        5        AI Alchemists [34]         0.7074
                        6        Sanvadita [35]             0.7064
                        7        Z-AGI Labs [36]            0.7052
                        8        Corgi                      0.7044
                        9        JCT/ Avigail Stekel [37]   0.6988
                        10       Code Fellas [38]           0.6972
                        11       IRLab@IITBHU               0.6967
                        12       Komar99                    0.6946
                        13       MUCS [39]                  0.6883
                        14       Michal Stekel              0.6862
                        15       Chen876                    0.6811
                        16       Ravens                     0.6620
                        17       CNLP-NITS-PP [40]          0.5948
                        18       Team +1                    0.4831
                        19       CIT TEAM                   0.4684
                        20       InclusiveTechies           0.3468
Table 3
Result of task 4: Annihilate Hates (Assamese)
                       Rank     Team name                       Marco F1 score
                       1        Sanvadita [35]                  0.7702
                       2        FiRC-NLP [31]                   0.7642
                       3        Z-AGI Labs [36]                 0.7562
                       4        Daniil Orel                     0.7507
                       5        TeamBD [32]                     0.7349
                       6        AI Alchemists [34]              0.7257
                       7        Code Fellas [38]                0.7195
                       8        Chetona [30]                    0.6785
                       9        SATLab [33]                     0.6707
                       10       MUCS [39]                       0.6683
                       11       JCT/ Avigail Stekel [37]        0.6649
                       12       Chen876                         0.6603
                       13       Michal Stekel                   0.6569
                       14       IRLab@IITBHU                    0.6527
                       15       Komar99                         0.6466
                       16       Ravens                          0.6088
                       17       CNLP-NITS-PP [40]               0.6010
                       18       CHANDAN SENAPATI [41]           0.5062
                       19       Team +1                         0.4709
                       20       CIT TEAM                        0.3754
                       21       InclusiveTechies                0.3583
Table 4
Result of task 4: Annihilate Hates (Bengali)

                    Rank      Team name                           Marco F1 score
                    1         SATLab [33]                         0.8565
                    2         Komar99                             0.8507
                    3         JCT/ Avigail Stekel [37]            0.8507
                    4         FiRC-NLP [31]                       0.8484
                    5         Chetona [30]                        0.8437
                    6         AI Alchemists [34]                  0.8437
                    7         Ravens                              0.8434
                    8         Chen876                             0.8427
                    9         Michal Stekel                       0.8378
                    10        MUCS [39]                           0.8368
                    11        Code Fellas [38]                    0.8351
                    12        Z-AGI Labs/ Nikhil Narayan [36]     0.8300
                    13        Corgi                               0.8186
                    14        TeamBD [32]                         0.7629
                    15        IRLab@IITBHU                        0.7427
                    16        CNLP-NITS-PP [40]                   0.6692
                    17        Team +1                             0.4952
                    18        CIT TEAM                            0.4152
                    19        InclusiveTechies                    0.3148
Table 5
Result of task 4: Annihilate Hates (Bodo)
       Multilingual-Cased, XLM-Roberta-Base, Muril-Base, and XLMIndic-Base (UniScript12 and
       Multi-Script13 ) and got best result fine-tuning Bert Base Multilingual Cased model out of
       all experiments.
     • JCT/ Avigail Stekel [37] develops different models using five classical supervised machine
       learning methods: multinomial Naive Bayes (MNB), support vector classifier, random
       forest, logistic regression (LR), and multi-layer perceptron. Their models were applied
       to word unigrams and/or character n-gram features. Their best model for the Assamese
       language is an MNB model with 5-gram features.
     • Code Fellas [38] approaches which broadly involve Long Short Term Memory (LSTM) cou-
       pled with Convolutional Neural Networks (CNN) and pre-trained Bidirectional Encoder
       Representations from Transformers (BERT) based models like IndicBERT and MuRIL.
       Notably, their results showcase the effectiveness of these approaches, with IndicBERT
       achieving a remarkable F1 score for Assamese.
     • MUCS [39], various experiments were carried out with different combinations of features
       (syllable n-grams, char n-grams, and fastText word embeddings) and different approaches
       (ML and FSL) to identify the given input. SVM trained with TF-IDF of syllable n-grams
       and TF-IDF of char n-grams, both in the range (1, 3), and this is their best model.
     • CNLP-NITS-PP [40] experiments with CNN+FastText, CNN-BiLSTM+FastText/GLoVe,
       GPT-2, BERT, Logistic Regression. A CNN-based Binary Classification Model with Fast-
       Text Embeddings outperforms their other systems.

6.2. AH-Bengali
     • Sanvadita [35] uses pre-trained monolingual Bengali Sentence-BERT14 , Bengali-BERT
       models15 and multilingual Indic Sentence-BERT 16 .
     • FiRC-NLP [31] utilizes XLM-RoBERTa-large model.
     • Z-AGI Labs [36] utilizes pre-trained models for the experiments but achieves the highest
       score fine-tuning the csebuetnlp/banglabert pre-trained model.
     • TeamBD [32] experiments with xlm-roberta-large model (multiingual).
     • AI Alchemists [34] fune-tuned XLM-RoBERTa model.
     • Code Fellas [38] fine-tuned MuRIL for Bengali to get the best experiment results out of all
       experiments done by the team.
     • Chetona [30] applies the same for Bengali language as Assamese languages.
     • SATLab [33] utilizes the same system as Assamese.
     • MUCS [39], SVM trained with TF-IDF of syllable n-grams and TF-IDF of char n-grams
       both in the range (1, 3).
     • JCT/ Avigail Stekel [37], their best model for Bengali is an MNB model with 6-gram
       features out of all experiments they performed, mentioned in the Assamese section.
12
   https://huggingface.co/ibraheemmoosa/xlmindic-base-uniscript (Access on 30.10.2023)
13
   https://huggingface.co/ibraheemmoosa/xlmindic-base-multiscript (Access on 30.10.2023)
14
   https://huggingface.co/l3cube-pune/bengali-sentence-bert-nli (Access on 30.10.2023)
15
   https://huggingface.co/l3cube-pune/bengali-bert (Access on 30.10.2023)
16
   https://huggingface.co/l3cube-pune/indic-sentence-bert-nli (Access on 30.10.2023)
    • CNLP-NITS-PP [40], a CNN-based Binary Classification Model with FastText Embeddings
      outperforms the other systems.
    • CHANDAN SENAPATI [41] implements the deep learning model LSTM.

6.3. AH-Bodo
    • SATLab [33] utilizes the same system applied to the Assamese dataset.
    • JCT/ Avigail Stekel [37], their best submission for Bodo is a LR with all word unigrams in
      the training set.
    • FiRC-NLP [31] utilizes the XLM-RoBERTa-large model, the best submission among all
      their experiments.
    • Chetona [30] applies the same system for the Bodo language as mentioned in the Assamese
      section.
    • AI Alchemists [34] fune-tuned XLM-RoBERTa model.
    • MUCS [39] trains SVM with TF-IDF of syllable n-grams and TF-IDF of char n-grams both
      in the range (1, 3) obtained the best macro F1 scores.
    • Code Fellas [38] uses a BiLSTM model enhanced with an additional Dense Layer attaining
      an impressive F1 score for Bodo.
    • Z-AGI Labs/ Nikhil Narayan [36] fine-tuned a pre-trained Bert Base Multilingual Cased
      model for Bodo.
    • TeamBD [32] applies xlm-roberta-large model (multiingual).
    • CNLP-NITS-PP [40] gets best result for Bodo dataset with Logistic Regression.


7. Conclusion
The submissions in the AH task (Task 4, HASOC 2023) have shown transformer-based pre-
trained models to be the state-of-the-art approach for Hate Speech detection in the Assamese
and Bengali datasets. However, the L2-regularized logistic regression model gives the best result
for the Bodo dataset. Other deep learning models, like LSTM, CNNs, etc., also perform well on
the given datasets. Upon reviewing the outcomes, the most suitable approach for hate speech
detection depends on factors such as the language of the dataset, the level of classification detail,
and the distribution of class labels. Balancing an imbalanced training dataset could impact
the classification system’s effectiveness. In the long run, the AH task aims to provide more
low-resourced data with binary and multi-label classification tasks.


8. Acknowledgement
We thank Mr. Debarshi Sonowal, Mr. Abhilash Basumatary, and Ms. Bidisha Gogoi for their
help in collecting and tagging the Assamese hate dataset. Additionally, we extend our thanks to
Mr. Maharaj Brahma and Mr. Mwnthai Narzary for their valuable contributions in collecting
and labelling the Bodo hate dataset. We also thank the FIRE and HASOC organizers for their
support in organizing the track. We thank all participants for their submissions and their
valuable work.
References
 [1] M. L. Williams, P. Burnap, A. Javed, H. Liu, S. Ozalp,                    Hate in the Ma-
     chine:       Anti-Black and Anti-Muslim Social Media Posts as Predictors of
     Offline Racially and Religiously Aggravated Crime,                    The British Journal
     of Criminology 60 (2019) 93–117. URL: https://doi.org/10.1093/bjc/azz049.
     doi:10.1093/bjc/azz049 .               arXiv:https://academic.oup.com/bjc/article-
     pdf/60/1/93/31634412/azz049.pdf .
 [2] Z. Laub, Hate speech on social media: Global comparisons, Council on foreign relations 7
     (2019).
 [3] A. Nicholas, C. Ezeibe, The state, hate speech regulation and sustainable democracy in
     africa: a study of nigeria and kenya, African Identities (2020). doi:10.1080/14725843.
     2020.1813548 .
 [4] T. Quintel, C. Ullrich, Self-Regulation of Fundamental Rights? The EU Code of Conduct
     on Hate Speech, Related Initiatives and Beyond, Edward Elgar Publishing, 2019. Available
     at SSRN: https://ssrn.com/abstract=3298719.
 [5] S. Jaki, T. De Smedt, M. Gwóźdź, R. Panchal, A. Rossa, G. De Pauw, Online hatred of women
     in the incels.me forum: Linguistic analysis and automatic detection, Journal of Language
     Aggression and Conflict 7 (2019) 240–268. URL: https://www.jbe-platform.com/content/
     journals/10.1075/jlac.00026.jak. doi:https://doi.org/10.1075/jlac.00026.jak .
 [6] G. L. Casey, Ending the incel rebellion: The tragic impacts of an online hate group, Loyola
     Journal of Public Interest Law 21 (2019) 71. URL: https://heinonline.org/HOL/P?h=hein.
     journals/loyjpubil21&i=79.
 [7] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, V. Patti, Resources and benchmark corpora
     for hate speech detection: a systematic review, Language Resources and Evaluation 55
     (2021) 1–47. doi:10.1007/s10579- 020- 09502- 8 .
 [8] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, Predicting the type
     and target of offensive posts in social media, in: Proceedings of the 2019 Conference of
     the North American Chapter of the Association for Computational Linguistics: Human
     Language Technologies, Volume 1 (Long and Short Papers), Association for Computational
     Linguistics, Minneapolis, Minnesota, 2019, pp. 1415–1420. URL: https://aclanthology.org/
     N19-1144. doi:10.18653/v1/N19- 1144 .
 [9] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the
     problem of offensive language, Proceedings of the International AAAI Conference on Web
     and Social Media 11 (2017) 512–515. URL: https://ojs.aaai.org/index.php/ICWSM/article/
     view/14955. doi:10.1609/icwsm.v11i1.14955 .
[10] I. Kwok, Y. Wang, Locate the hate: Detecting tweets against blacks, Proceedings of the
     AAAI Conference on Artificial Intelligence 27 (2013) 1621–1622. URL: https://ojs.aaai.org/
     index.php/AAAI/article/view/8539. doi:10.1609/aaai.v27i1.8539 .
[11] Kaggle,           Toxic comment classification challenge:               Identify and clas-
     sify     toxic     online    comments       (2017).     URL:     https://www.kaggle.com/c/
     jigsaw-toxic-comment-classification-challenge.
[12] Z. Pitenis, M. Zampieri, T. Ranasinghe, Offensive language identification in greek, CoRR
     abs/2003.07459 (2020). URL: https://arxiv.org/abs/2003.07459. arXiv:2003.07459 .
[13] P. Fortuna, J. Rocha da Silva, J. Soler-Company, L. Wanner, S. Nunes, A hierarchically-
     labeled Portuguese hate speech dataset, in: Proceedings of the Third Workshop on Abusive
     Language Online, Association for Computational Linguistics, Florence, Italy, 2019, pp.
     94–104. URL: https://aclanthology.org/W19-3510. doi:10.18653/v1/W19- 3510 .
[14] G. I. Sigurbergsson, L. Derczynski, Offensive language and hate speech detection for danish,
     CoRR abs/1908.04531 (2019). URL: http://arxiv.org/abs/1908.04531. arXiv:1908.04531 .
[15] M. Aragon, M. A. Carmona, M. Montes, H. J. Escalante, L. Villaseñor-Pineda, D. Moctezuma,
     Overview of mex-a3t at iberlef 2019: Authorship and aggressiveness analysis in mexican
     spanish tweets, 2019.
[16] Ç. Çöltekin, A corpus of Turkish offensive language on social media, in: Proceedings of the
     Twelfth Language Resources and Evaluation Conference, European Language Resources
     Association, Marseille, France, 2020, pp. 6174–6184. URL: https://aclanthology.org/2020.
     lrec-1.758.
[17] T. Mandl, S. Modha, P. Majumder, D. Patel, M. Dave, C. Mandlia, A. Patel, Overview of the
     hasoc track at fire 2019: Hate speech and offensive content identification in indo-european
     languages, in: Proceedings of the 11th Forum for Information Retrieval Evaluation, FIRE
     ’19, Association for Computing Machinery, New York, NY, USA, 2019, p. 14–17. URL:
     https://doi.org/10.1145/3368567.3368584. doi:10.1145/3368567.3368584 .
[18] T. Mandl, S. Modha, A. Kumar M, B. R. Chakravarthi, Overview of the hasoc track at fire
     2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english
     and german, in: Forum for Information Retrieval Evaluation, FIRE 2020, Association for
     Computing Machinery, New York, NY, USA, 2020, p. 29–32. URL: https://doi.org/10.1145/
     3441501.3441517. doi:10.1145/3441501.3441517 .
[19] S. Modha, T. Mandl, G. K. Shahi, H. Madhu, S. Satapara, T. Ranasinghe, M. Zampieri,
     Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content iden-
     tification in english and indo-aryan languages and conversational hate speech, in: Fo-
     rum for Information Retrieval Evaluation, FIRE 2021, Association for Computing Ma-
     chinery, New York, NY, USA, 2021, p. 1–3. URL: https://doi.org/10.1145/3503162.3503176.
     doi:10.1145/3503162.3503176 .
[20] N. Romim, M. Ahmed, M. S. Islam, A. Sen Sharma, H. Talukder, M. R. Amin, BD-SHS:
     A benchmark dataset for learning to detect online bangla hate speech in different social
     contexts, in: Proceedings of the Thirteenth Language Resources and Evaluation Conference,
     European Language Resources Association, Marseille, France, 2022, pp. 5153–5162. URL:
     https://aclanthology.org/2022.lrec-1.552.
[21] V. Gupta, S. Roychowdhury, M. Das, S. Banerjee, P. Saha, B. Mathew, h. p. vanchi-
     nathan, A. Mukherjee, Multilingual abusive comment detection at scale for indic lan-
     guages, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.),
     Advances in Neural Information Processing Systems, volume 35, Curran Associates, Inc.,
     2022, pp. 26176–26191. URL: https://proceedings.neurips.cc/paper_files/paper/2022/file/
     a7c4163b33286261b24c72fd3d1707c9-Paper-Datasets_and_Benchmarks.pdf.
[22] S. Satapara, H. Madhu, T. Ranasinghe, A. E. Dmonte, M. Zampieri, P. Pandya, N. Shah,
     M. Sandip, P. Majumder, T. Mandl, Overview of the hasoc subtrack at fire 2023: Hate-
     speech identification in sinhala and gujarati, in: K. Ghosh, T. Mandl, P. Majumder, M. Mitra
     (Eds.), Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, Goa,
     India. December 15-18, 2023, CEUR Workshop Proceedings, CEUR-WS.org, 2023.
[23] S. Madhu, Hiren Satapara, P. Pandya, N. Shah, T. Mandl, S. Modha, Overview of the
     hasoc subtrack at fire 2023: Identification of conversational hate-speech, in: K. Ghosh,
     T. Mandl, P. Majumder, M. Mitra (Eds.), Working Notes of FIRE 2023 - Forum for Information
     Retrieval Evaluation, Goa, India. December 15-18, 2023, CEUR Workshop Proceedings,
     CEUR-WS.org, 2023.
[24] S. Masud, M. A. Khan, M. S. Akhtar, T. Chakraborty, Overview of the HASOC Subtrack
     at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span
     Detection, in: Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation,
     Goa, India. December 15-18, 2023, CEUR Workshop Proceedings, CEUR-WS.org, 2023.
[25] K. Ghosh, A. Senapati, Hate speech detection: an analysis of mono and multilingual
     transformer models with cross-language evaluation on hindi, marathi, bangla, and bodo
     language, Natural Language Engineering Accepted on 26.10.2023 (2023).
[26] K. Ghosh, A. Senapati, M. Narzary, M. Brahma, Hate speech detection in low-resource
     bodo and assamese texts with ml-dl and bert models, Scalable Computing: Practice and
     Experience 24 (2023) 941–955.
[27] K. Ghosh, D. Sonowal, A. Basumatary, B. Gogoi, A. Senapati, Transformer-based hate
     speech detection in assamese, in: 2023 IEEE Guwahati Subsection Conference (GCON),
     2023, pp. 1–5. doi:10.1109/GCON58516.2023.10183497 .
[28] K. Ghosh, D. A. Senapati, Hate speech detection: a comparison of mono and multilingual
     transformer model with cross-language evaluation, in: Proceedings of the 36th Pacific Asia
     Conference on Language, Information and Computation, Association for Computational
     Linguistics, Manila, Philippines, 2022, pp. 853–865. URL: https://aclanthology.org/2022.
     paclic-1.94.
[29] K. Ghosh, A. Senapati, U. Garain, Baseline bert models for conversational hate speech
     detection in code-mixed tweets utilizing data augmentation and offensive language identi-
     fication in marathi, in: Forum for Information Retrieval Evaluation (Working Notes)(FIRE).
     CEUR-WS. org, 2022.
[30] S. Saha, M. Sullivan, R. Srihari, Hate Speech Detection in Low Resource Indo-Aryan
     Languages, in: Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation,
     CEUR, 2023.
[31] M. S. Jahan, F. Hassan, W. Aransa, A. Bouchekif, Multilingual Hate Speech Detection
     Using Ensemble of Transformer Models, in: Working Notes of FIRE 2023 - Forum for
     Information Retrieval Evaluation, CEUR, 2023.
[32] K. M. Jhuma, M. Oussalah, A. Singhal, Cross-Linguistic Offensive Language Detection:
     BERT-Based Analysis of Bengali, Assamese, & Bodo Conversational Hateful Content from
     Social Media, in: Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation,
     CEUR, 2023.
[33] Y. Bestgen, Using Only Character Ngrams for Hate Speech and Offensive Content Identi-
     fication in Five Low-Ressource Languages, in: Working Notes of FIRE 2023 - Forum for
     Information Retrieval Evaluation, CEUR, 2023.
[34] C. Muhammad Awais, J. Raj, Breaking Barriers: Multilingual Toxicity Analysis for Hate
     Speech and Offensive Language in Low-Resource Indo-Aryan Languages, in: Working
     Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.
[35] A. Joshi, R. Joshi, Harnessing Pre-Trained Sentence Transformers for Offensive Language
     Detection in Indian Languages, in: Working Notes of FIRE 2023 - Forum for Information
     Retrieval Evaluation, CEUR, 2023.
[36] N. Narayan, M. Biswal, P. Goyal, A. Panigrahi, Hate Speech and Offensive Content
     Detection in Indo-Aryan Languages: A Battle of LSTM and Transformers, in: Working
     Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.
[37] A. Stekel, A. Prives, Y. HaCohen-Kerner, Detecting Offensive Language in Bengali, Bodo,
     and Assamese using Word Unigrams, Char N-grams, Classical Machine Learning, and
     Deep Learning Methods, in: Working Notes of FIRE 2023 - Forum for Information Retrieval
     Evaluation, CEUR, 2023.
[38] A. Reddy Gutha, N. Sai Adarsh, A. Alekar, D. Reddy, Multilingual Hate Speech and
     Offensive Language Detection of Low Resource Languages, in: Working Notes of FIRE
     2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.
[39] P. M, R. K, A. Hegde, K. G, S. Coelho, H. L. Shashirekha, Taming Toxicity: Learning Models
     for Hate Speech and Offensive Language Detection in Social Media Text, in: Working
     Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.
[40] G. Kalita, E. Halder, C. Taparia, A. Vetagiri, D. P. Pakray, Examining Hate Speech Detection
     Across Multiple Indo-Aryan Languages in Tasks 1 & 4, in: Working Notes of FIRE 2023 -
     Forum for Information Retrieval Evaluation, CEUR, 2023.
[41] C. Senapati, U. Roy, Bengali Hate Speech Detection Using Deep Learning Technique, in:
     Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.
[42] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, C.-J. Lin, Liblinear: A library for large
     linear classification, J. Mach. Learn. Res. 9 (2008) 1871–1874.