=Paper= {{Paper |id=Vol-3159/T1-34 |storemode=property |title=Multilingual Hate Speech and Offensive Content Detection using Modified Cross-entropy Loss |pdfUrl=https://ceur-ws.org/Vol-3159/T1-34.pdf |volume=Vol-3159 |authors=Arka Mitra,Priyanshu Sankhala |dblpUrl=https://dblp.org/rec/conf/fire/MitraS21 }} ==Multilingual Hate Speech and Offensive Content Detection using Modified Cross-entropy Loss== https://ceur-ws.org/Vol-3159/T1-34.pdf
Multilingual Hate Speech and Offensive Content
Detection using Modified Cross-entropy Loss
Arka Mitra1 , Priyanshu Sankhala2
1
    Indian Institute of Technology, Kharagpur, India
2
    National Institute of Technology Raipur, India


                                         Abstract
                                         The number of increased social media users has led to a lot of people misusing these platforms to spread
                                         offensive content and use hate speech. Manual tracking the vast amount of posts is impractical so it is
                                         necessary to devise automated methods to identify them quickly. Large language models are trained on
                                         a lot of data and they also make use of contextual embeddings. We fine-tune the large language models
                                         to help in our task. The data is also quite unbalanced; so we used a modified cross-entropy loss to tackle
                                         the issue. We observed that using a model which is fine-tuned in hindi corpora performs better. Our
                                         team (HNLP) achieved the macro F1-scores of 0.808, 0.639 in English Subtask A and English Subtask B
                                         respectively. For Hindi Subtask A, Hindi Subtask B our team achieved macro F1-scores of 0.737, 0.443
                                         respectively in HASOC 2021.

                                         Keywords
                                         Hate speech detection, Text classification, Deep-learning, Transfer learning




1. Introduction
With the increased use of social media platform like Twitter, Facebook, Instagram, and YouTube
by users around the world, the platforms have had positive aspects including but not limited
to social interaction, meeting like-minded people, giving a voice to each individual to share
their opinions [1]. However, as a result, social media platforms can also be used to spread hate
comments, hate posts by certain individuals or groups; which can lead to having anxiety, mental
illness and severe stress to people who consume that hate content [2]. It becomes necessary
to be able to detect such activities at its earliest to stop it from spreading, thereby making
social media a healthy place to interact and share their views without a fear of getting hate
comments[3].
The hate posts can be insults or racist or discriminating on the bases of a particular gender,
religion, nationality, age bracket, ethnicity. Such comments can also lead to goading of violence
amongst people. With the large number of posts being shared each minute, it is not possible to
manually classify each of the posts. Thus, a pre-programmed system is required to distinguish
Hate speech activities quickly as hate content gains a lot of attention and is subject to be shared
fast as well [4]. Direct targeted abuses and profane content are not that difficult to classify.
Forum for Information Retrieval Evaluation, December 13-17, 2021, India
Envelope-Open thearkamitra@gmail.com (A. Mitra); priyanshu.nitrr.ele@gmail.com (P. Sankhala)
GLOBE https://thearkamitra.github.io/ (A. Mitra); https://priyanshusankhala.github.io/ (P. Sankhala)
Orcid 0000-0003-1071-7294 (A. Mitra); 0000-0003-2796-0039 (P. Sankhala)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
However, it is extremely hard to recognize indirect hate content often involving use of humour,
irony, sarcasm even for an human annotator when the context of the posts are not provided.
This makes the classification task additionally more difficult for most progressive frameworks.
HASOC 2021 [5] is a shared task for the identification of hateful and offensive content in English
and Indo-Aryan Languages. We participated in two sub-tasks for English and Hindi language
[6].
The sub task A refers to classifying twitter samples into:
    • H O F Hate and offensive :- contains hate speech/profane/offensive content.
    • N O T Non Hate-offensive :- which does not contain any hate speech, profane, offensive
      content.
The sub task B refers to classifying twitter samples into:
    • H A T E Hate speech :- Posts under this class contain Hate speech content.
    • O F F N Offensive :- Posts under this class contain offensive content.
    • P R F N Profane :- These posts contain profane words.
    • N O N E Non-Hate :- These posts do not contain any hate speech content.
For tasks pertaining to English language, we experimented with large language models like
fine-tuning BERT (Bidirectional Encoder Representation from Transformer) [7], RoBERTa (A
Robustly Optimized BERT Pretraining Approach) [8] and XLNet (Generalized Autoregressive
Pretraining for Language Understanding) [9] out of which RoBERTa outperformed others with
the macro F1-score of 0.8089 while BERT and XLNet had the macro F1-score of 0.8050 and
0.7757 respectively in Subtask A and for Subtask B the macro F1-score was 0.6396 with RoBERTa
model respectively. For the tasks referring to Hindi language, the authors used a model which
is fine-tuned on detecting Hinglish sentiments [10] and had the macro F1-score of 0.7379 for
Subtask A and macro F1-score of 0.4431 for Subtask B.


2. Related Work
In this section, we will discuss the previous state of the art methods proposed for detection of
hate speech. The use of BERT and other transfer learning algorithms, and deep neural models
based on LSTMs and CNNs tend to perform similar but better than traditional classifiers such
as SVM [11]. The number of papers, trying to automate Hate speech detection, that have been
published in Web of Science has been increasing exponentially [12]. Waseem et al. [13] have
classified hate speech into different categories and led to the Offensive Language Identification
Dataset (OLID) [14].
There has been work in different sub fields of abuse like in sexism [15, 16], cyberbullying [17],
trolling [18] and so on. There are hate comments in most of the social media sites like Youtube
[19], Instagram [20] which shows the importance of having a generalized Hate detection model
[13]. Work done by Yin et al. [21] gives an overall idea of the generalizability of the different
models that are present for hate speech detection. For the different models, the features from
the input that are used have a great impact on the performance. Xu et al. [22] showed that
part-of-speech tags are quite successful for improving the model; it is further improved by
considering the sentiment values [23]. The sentences in the online platforms do not always
follow the normal textual formats or correct spellings. Thus, Mehdad et al. [24] used a character
level encoding rather than using the word level encoding proposed by Meyer et al. [25]. The
type of architecture used also impacts on the performance on the model. Swamy et al. [26]
performed a comprehensive study that shows how different models perform and generalize.


3. Methodology
HASOC 2021 [6] has been going on for two years now and a lot of different ways are uncovered to
detect hate content [27, 28]. This paper covers the use of large language models for classification
of hate speech content.

3.1. Languages
The Hate speech and Offensive Content Identification in English and Indo-Aryan Languages
HASOC 2021 [5, 6] purposes two different tasks, in 3 different languages English, Hindi, Marathi.
The authors participated in both tasks for English and Hindi languages.

3.2. Task description
The first task in all languages know as ”Subtask A” refers to a classification problem of twitter
samples which were labelled as H O F - Hate and offensive content and N O T - Not hate and offensive
content. The second task, know as ”Subtask B” refers to a classification of twitter samples which
were labelled as P R F N - Profane Words, H A T E - Hate speech, O F F N - Offensive Content, and N O N E -
Non-hate content. The detailed description of all columns present in a dataset is given in Table
1 and the number of twitter samples corresponding to each label is given in Table 2.

3.3. Approach
The dataset that is provided in all the subtasks has an unequal number of samples per class.
Table. 2 shows the overall distribution. For subtask A for English, the ratio of the classes (HOF
and NOT) is around 2:1 while for Hindi it is around 1:2. Again, for subtask B, the ratio of the
classes (PRFN, HATE, OFFN, NONE) is about 2:1:1:2 for english and approximately 2:5:6:30 for
Hindi. From the ratio, one can understand that it would be unjust for the loss for each class to
be the same. The cross entropy loss assigns same value to a probability score irrespective of
the number of times it is present in the training set. To mitigate this, the authors have used
modified cross-entropy loss as shown in Eqn. 1; it assigns a greater loss whenever a class with
smaller frequency is misclassified. The weights factor in Eqn. 1 has a higher value for a class
if the class has a lower frequency. This penalizes the model whenever that class is wrongly
predicted and helps to improve the performance of the model.


            𝑙𝑜𝑠𝑠(𝑙𝑜𝑔𝑖𝑡𝑠, 𝑐𝑙𝑎𝑠𝑠) = 𝑤𝑒𝑖𝑔ℎ𝑡[𝑐𝑙𝑎𝑠𝑠] ∗ (−𝑙𝑜𝑔𝑖𝑡𝑠[𝑐𝑙𝑎𝑠𝑠] + 𝑙𝑜𝑔(∑ 𝑒𝑥𝑝(𝑙𝑜𝑔𝑖𝑡𝑠[𝑗])))             (1)
                                                                           𝑗
Table 1
The detailed data description is given in table below:-
                Columns                                Description
                tweet_id                     unique value for the tweets
                   text                          full text of the tweets
                  task1           label, either tweet is HOF or NOT for Subtask A
                  task2      label, either tweet is HATE, OFFN or PRFN for Subtask B
                    ID           unique hasoc ID for each tweet for Hindi data set

Table 2
Class division of both subtasks for Train and Test Dataset
                                                      Train set         Test set
                    Subtasks      No. of posts
                                                 English Hindi      English Hindi
                                      HOF          2501      1433     798      1027
                   Subtask A
                                      NOT          1342      3161     483       505
                                     PRFN          1196       213     224        74
                   Subtask B         HATE           683       566     379       215
                                     OFFN           622       654     95        216
                                     NONE          1342      3161     483      1027
                              Total                3843      4594    1281      1532




     Figure 1: Overall Pipeline                           Figure 2: Language Model


The authors used large-language models since the models are trained on a large amount of data
and thus can understand the semantic structure of sentences and the tokens that are sent as
inputs to these models have a contextual embedding associated with them. The output of the
model is taken and then pooled. The resulting output is then passed through a linear layer and
a argmax is used to find the expected class of the sentence as shown in Figure. 1.



4. Results
The authors submitted four groups of results Table 3 gives the final results for our submission.
The results has been evaluated on a test dataset, which is about one-third of the training data
size, using the Macro F1 scores.

Table 3
Results from the official Test set from the leaderboard published from 15% the data set
                     Task           Our Score (Macro average F1)    Best Score    Rank
              English Subtask A                0.8089                 0.8177       4
              English Subtask B                0.6396                 0.6657       6
               Hindi Subtask A                 0.7379                 0.7825       22
               Hindi Subtask B                 0.4431                 0.5603       16

   The experiments showed that large cased BERT performed the best followed by RoBERTa and
the lowest scores were obtained from the BERT base model. The maximum sequence length that
is used has a direct impact on the performance; with a larger length having a better performance,
with the training time increases at the same time.
The methodology followed for both English and Hindi are the same, but the performance
obtained for the English subtask is quite better than that for the Hindi subtask. This shows that
the language models are pretty good in understanding the semantics for English but fail to do
so for a low resource language like Hindi. The modified cross-entropy loss provided a better F1
score as compared to training with equal importance given to all of the separate classes.


5. Experimental Details
For English language we experimented with RoBERTa base pre-trained model [8], fine tuned
BERT large cased architecture[7], and XLNet [9]- all for the same configuration, i.e, max length
is set to 120, batch size to 8 and trained with 4 number of epochs. AdamW optimizer [29] with
an initial learning rate of 2e-5 is used for training. Similarly for hindi language tasks we used a
pre-trained model [10] from the Hugging face [30] library. The Max length has been set to 200,
batch size was 8, and number of epochs was set to 4.
There is a trade-off between the accuracy and the total number of tokens. The amount of time
the model takes for training is proportional to the square of the number of tokens. As the
number of tokens increases, the amount of time increases. However, when we truncate the
maximum length, some of the information present in the sentence gets lost and the prediction
for the sentence might be wrong. We had to consider a trade-poff between the accuracy and
the time it takes for the model to train. For deciding the maximum sentence length, about 99%
percentile of number of tokens in sentences is considered. For generating predictions we made a
split of 90 % for training and 10 % validation to compare the performance of different models, for
each specific task and based on F1 scores of a particular epoch we updated the model weights.
The weights corresponding to the best validation scores have been selected for inferring the
test values. We observed that usually 3, 4 trained epochs had a higher F1 score.
For reproducibility, the codes have been uploaded to github 1 . The random seed has been set to
42.


6. Conclusions
In this paper, we explain the shared tasks presented by HASOC in English and Indo-Aryan
languages. We used large language models which are pre-trained on large corpora for hate
speech detection tasks and to evaluate predictions by different models a validation dataset was
created. In future work, we hope to try out more different fine tuned models.


Acknowledgments
The authors would like to thank the organizers of Hate Speech and Offensive Content Iden-
tification in Indo-Aryan Languages 2021 [5] for conducting this data challenge. The authors
gratefully acknowledge google colab for providing GPU’s to do the computation. All pre-trained
models is based upon work supported by Hugging Face [30].


References
 [1] O. Istaiteh, R. Al-Omoush, S. Tedmori, Racist and Sexist Hate Speech Detection: Literature
     Review, 2020 International Conference on Intelligent Data Science Technologies and
     Applications (IDSTA) (2020) 95–99.
 [2] S. Kawate, K. Patil, Analysis of foul language usage in social media text conversation, Int.
     J. Soc. Media Interact. Learn. Environ. 5 (2017) 227–251.
 [3] S. Jaki, T. D. Smedt, M. Gwóźdź, R. Panchal, A. Rossa, G. D. Pauw, Online hatred of women
     in the Incels.me forum 7 (2019) 240–268. URL: https://doi.org/10.1075%2Fjlac.00026.jak.
     doi:1 0 . 1 0 7 5 / j l a c . 0 0 0 2 6 . j a k .
 [4] B. Mathew, R. Dutt, P. Goyal, A. Mukherjee, Spread of Hate Speech in Online Social Media,
     Proceedings of the 10th ACM Conference on Web Science (2019).
 [5] S. Modha, T. Mandl, G. K. Shahi, H. Madhu, S. Satapara, T. Ranasinghe, M. Zampieri,
     Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content
     Identification in English and Indo-Aryan Languages and Conversational Hate Speech, in:
     FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event, 13th-17th December
     2021, ACM, 2021.
 [6] T. Mandl, S. Modha, G. K. Shahi, H. Madhu, S. Satapara, P. Majumder, J. Schäfer, T. Ranas-
     inghe, M. Zampieri, D. Nandini, A. K. Jaiswal, Overview of the HASOC subtrack at FIRE
     2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Lan-


   1
       https://github.com/priyanshusankhala/hasoc-hnlp
     guages, in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation,
     CEUR, 2021. URL: http://ceur-ws.org/.
 [7] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
     Transformers for Language Understanding, in: NAACL, 2019.
 [8] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
     V. Stoyanov, RoBERTa: A Robustly Optimized BERT Pretraining Approach, ArXiv
     abs/1907.11692 (2019).
 [9] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, Q. V. Le, XLNet: Generalized
     Autoregressive Pretraining for Language Understanding, in: NeurIPS, 2019.
[10] M. Bhange, N. Kasliwal, Hinglishnlp: Fine-tuned language models for hinglish sentiment
     detection (2020).
[11] S. Modha, T. Mandl, P. Majumder, D. Patel, Tracking Hate in Social Media: Evaluation,
     Challenges and Approaches, SN Comput. Sci. 1 (2020) 105.
[12] M. A. Paz, J. Montero-Díaz, A. Moreno-Delgado, Hate speech: A systematized review,
     SAGE Open 10 (2020).
[13] Z. Waseem, T. Davidson, D. Warmsley, I. Weber, Understanding abuse: A typology of
     abusive language detection subtasks, in: Proceedings of the First Workshop on Abusive
     Language Online, Association for Computational Linguistics, Vancouver, BC, Canada,
     2017, pp. 78–84. URL: https://aclanthology.org/W17-3012. doi:1 0 . 1 8 6 5 3 / v 1 / W 1 7 - 3 0 1 2 .
[14] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, Predicting the type
     and target of offensive posts in social media, in: Proceedings of the 2019 Conference of
     the North American Chapter of the Association for Computational Linguistics: Human
     Language Technologies, Volume 1 (Long and Short Papers), Association for Computational
     Linguistics, Minneapolis, Minnesota, 2019, pp. 1415–1420. URL: https://aclanthology.org/
     N19-1144. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 9 - 1 1 4 4 .
[15] Z. Waseem, D. Hovy, Hateful symbols or hateful people? predictive features for hate
     speech detection on Twitter, in: Proceedings of the NAACL Student Research Workshop,
     Association for Computational Linguistics, San Diego, California, 2016, pp. 88–93. URL:
     https://aclanthology.org/N16-2013. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 6 - 2 0 1 3 .
[16] S. Jaki, T. de Smedt, M. Gwózdz, R. Panchal, A. Rossa, G. D. Pauw, Online hatred of women
     in the Incels.me forum, Journal of Language Aggression and Conflict (2019).
[17] M. Dadvar, D. Trieschnigg, R. Ordelman, F. de Jong, Improving cyberbullying detection
     with user context, in: P. Serdyukov, P. Braslavski, S. O. Kuznetsov, J. Kamps, S. Rüger,
     E. Agichtein, I. Segalovich, E. Yilmaz (Eds.), Advances in Information Retrieval, Springer
     Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 693–696.
[18] R. Kumar, A. K. Ojha, M. Zampieri, S. Malmasi (Eds.), Proceedings of the First Workshop
     on Trolling, Aggression and Cyberbullying (TRAC-2018), Association for Computational
     Linguistics, Santa Fe, New Mexico, USA, 2018. URL: https://aclanthology.org/W18-4400.
[19] K. Dinakar, R. Reichart, H. Lieberman, Modeling the Detection of Textual Cyberbullying,
     in: The Social Mobile Web, 2011.
[20] H. Zhong, H. Li, A. C. Squicciarini, S. M. Rajtmajer, C. Griffin, D. J. Miller, C. Caragea,
     Content-Driven Detection of Cyberbullying on the Instagram Social Network, in: IJCAI,
     2016.
[21] W. Yin, A. Zubiaga, Towards generalisable hate speech detection: a review on obstacles
     and solutions, PeerJ. Computer science 7 (2021) e598–e598.
[22] J.-M. Xu, K.-S. Jun, X. Zhu, A. Bellmore, Learning from bullying traces in social
     media, in: Proceedings of the 2012 Conference of the North American Chapter of
     the Association for Computational Linguistics: Human Language Technologies, As-
     sociation for Computational Linguistics, Montréal, Canada, 2012, pp. 656–666. URL:
     https://aclanthology.org/N12-1084.
[23] T. Davidson, D. Warmsley, M. W. Macy, I. Weber, Automated Hate Speech Detection and
     the Problem of Offensive Language, in: ICWSM, 2017.
[24] Y. Mehdad, J. Tetreault, Do characters abuse more than words?, in: Proceedings of the
     17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Association
     for Computational Linguistics, Los Angeles, 2016, pp. 299–303. URL: https://aclanthology.
     org/W16-3638. doi:1 0 . 1 8 6 5 3 / v 1 / W 1 6 - 3 6 3 8 .
[25] J. S. Meyer, B. Gambäck, A platform agnostic dual-strand hate speech detector, in: Proceed-
     ings of the Third Workshop on Abusive Language Online, Association for Computational
     Linguistics, Florence, Italy, 2019, pp. 146–156. URL: https://aclanthology.org/W19-3516.
     doi:1 0 . 1 8 6 5 3 / v 1 / W 1 9 - 3 5 1 6 .
[26] S. D. Swamy, A. Jamatia, B. Gambäck, Studying generalisability across abusive language
     detection datasets, in: Proceedings of the 23rd Conference on Computational Natural Lan-
     guage Learning (CoNLL), Association for Computational Linguistics, Hong Kong, China,
     2019, pp. 940–950. URL: https://aclanthology.org/K19-1088. doi:1 0 . 1 8 6 5 3 / v 1 / K 1 9 - 1 0 8 8 .
[27] T. Mandl, S. Modha, P. Majumder, D. Patel, M. Dave, C. Mandalia, A. Patel, Overview
     of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification
     in Indo-European Languages, Proceedings of the 11th Forum for Information Retrieval
     Evaluation (2019).
[28] T. Mandl, S. Modha, G. K. Shahi, A. K. Jaiswal, D. Nandini, D. Patel, P. Majumder, J. Schäfer,
     Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Iden-
     tification in Indo-European Languages, Proceedings of the 12th Forum for Information
     Retrieval Evaluation (2020).
[29] I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, in: ICLR, 2019.
[30] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
     M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao,
     S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, HuggingFace’s Transformers: State-of-the-art
     Natural Language Processing, 2020. a r X i v : 1 9 1 0 . 0 3 7 7 1 .