1. Introduction

Mixture Models based on BERT for Hate Speech Detection

Haoyang Chen

hoyo.chen.i@gmail.com 0

Zhongyuan Han

hanzhongyuan@gmail.com 0

Leilei Kong

kongleilei@fosu.edu.cn 0

Zhijie Zhang

Zengyao Li

Mingcan Guo

Haoliang Qi

haoliang.qi@gmail.com 0 0 Foshan University , Foshan , China

While social platforms such as Twitter have brought convenience to people, they have also become a hotbed for spreading hate speech. Identifying hate speech and offensive content has become an important task. This paper presents our team's experiments on two shared tasks of HASOC 2022, where we fine-tuned three pre-trained models based on indic-abusive and multilingual BERT to perform hate speech detection on tweets in code-mixed languages. We try to reduce the impact of data imbalance by combining model predictions. Our team obtained 5th (with macro f1: 0.6388) in the dichotomous subtask 1 for Hinglish and German and 3rd (with macro f1: 0.4769) in subtask 2 for Hinglish with multiple classifications.

1. Introduction

With the widespread popularity of social media platforms such as Twitter and Facebook worldwide, users are free to express their thoughts and opinions. However, it has become a new challenge to detect and deal with the hate sentiment: the voice of hate speech and offensive content will cause severe mental stress to the victims, lead to social tensions, and lead to confrontation and violence [ 1, 2 ]. Similar objectionable content has seriously affected people's daily lives, and there is an urgent need to find a low-cost way to solve this challenge.

As a result, social media companies such as Twitter and YouTube have developed their detection systems to monitor user posts and filter hate content. However, current detection systems are mainly targeted at English-speaking environments and are still less practical for languages excluding English or Code-Mixed languages, such as Hinglish. In addition, it remains a chronic problem where contextual information is needed to identify hate speech (e.g., comments that do not contain hate per se but identify with parent tweets that are hate speech). In this state, Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC) proposes a series of hate detection classification tasks for low-resource languages, aiming to improve hate detection under low-resource languages. This paper describes our team's (fosu-nlp) working notes for the HASOC 2022 subtask. We studied several pretrained BERT models and tried to combine them to accomplish the task.

The rest of the paper is organized as follows: Section 2 provides an overview of recent works in hate detection. Section 3 briefly describes the task and dataset composition of HASOC 2022. We will present our methodology and model in Section 4, and the model results will be presented in Section 5. Finally, Section 6 summarizes our work.

2. Related Works

Many effective methods have been proposed for hate speech detection in recent years. Gambäck et al. [ 3 ]proposed to classify hate speech using CNN for word2vec embedded Twitter texts. Ayo et al. [ 4 ] performed feature extraction and topic inference of Twitter tweets by TF-IDF and Bayes classifier and proposed a rule-based clustering model. Furthermore, various attempts based on SVM [ 5 ], LSTM [ 6 ], or the current state-of-the-art pre-trained BERT [ 7 ] have been proposed for hate detection in mixed English and Hindi languages.

The HASOC committee has organized a series of tasks in the last few years [ 8, 9, 10 ]. At HASOC 2021, Banerjee et al. [ 11 ] explored some fine-tuning Transformer models and designing a weighted classifier layer at the final phase. Bhatia et al. [ 12 ] utilized an Emoji2Vec system to convert emojis into vectors to add features using emoji data instead of simply removing them. Regarding context-based hate speech detection, Zaki et al. [13] generated results by obtaining predictions from three BERT models and using soft/hard voting, and they ended up with an f1 score of 0.7253. It can be seen that pre-training-based models have significant potential, so we will continue to investigate the application of pre-training models for hate speech detection.

3. HASOC Task and Datasets

At this year's Forum for Information Retrieval Evaluation (FIRE, 2022), HASOC brought a new set of shared tasks [14], including identifying hate-speech posts in a code-mixed language on Twitter. Our team focused on subtasks 1 and 2: Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL) [15] Binary and Multiclass. Subtask 1 is a coarse-grained binary classification task that aims to identify hate speech in German and Hinglish (Hindi and English) conversations. Tweets should be classified with the following tags:   (NOT) Non- Hate-Offensive: The tweet does not contain hate speech.

(HOF) Hate and Offensive: The tweet reflects hateful, offensive, or profane content.

As an extension of subtask 1, subtask 2 is a triple classification task that further classifies the tweets as follows:    (SHOF) Standalone Hate: The tweet, comment, or reply contains hateful, offensive, and profane content. (CHOF) Contextual Hate: The comment or reply is treated as hate speech by supporting the hateful content expressed in its parent. This includes affirming the hate speech with a positive sentiment.

(NONE) Non-Hate: The tweet, comment, or response is not hate speech.

The dataset given by HASOC was sourced from Twitter and provided data on tweets and their replies in German and a code-mixed of English and Hindi languages. Statistical information on the number of labels in the original dataset is provided in Table 1.

The dataset was stored as a tree structure. For the model to obtain contextual information, the conversation must be flattened and stitched as the "parent-comment-reply" chain. Considering the actual situation that the model does not know the content of the parent tweets at the time of prediction, we choose to divide the original tweet nodes in a 9:1 ratio to form a new training and validation set. These datasets will be flattened in the next step.

4. Methodology

In this paper, two BERT models were used for fine-tuning experiments. The one is indic-abusiveallInOne-MuRIL [16], a newly proposed hate-detection-binary-classification-model trained for Indian multilingual by team Hate alert, which will be utilized as the primary model for the Hinglish binary task. The other is multilingual BERT [17], which will be used to handle the German part of the binary classification as well as the task of multiclassification. 4.1.

Data Pre-processing

First, we expand the tweets and link them to the corresponding tags. Considering the maximum input length limit of BERT, the conversation set will be flattened in the reverse order as "reply-commentparent," and all data will be pre-processed, specifically:    

All @USER and URL will be removed Extra spaces and line feeds will be removed All tweets will be normalized by stemming

Stop word list will be applied to all the tweets 4.2.

Subtask 1: ICHCL Binary

In the binary classification task of Subtask 1, two models were trained using tweet data from Hinglish and German, respectively. The Hinglish binary classification model (HNG-BCM), based on the fine-tuned indic-abusive pre-training model, aims to distinguish hate speech in Hindi and English mixed languages. The German part uses the fine-tuned multilingual bert as the German binary classification model (GER-BCM). The final output of both models will be combined and output as the result of Subtask 1. 4.3.

Subtask 2: ICHCL Multiclass

For Subtask 2, we treated the triple classification as two associated binary classification tasks to reduce the effect of data imbalance. Two types of hate speech data in multi-label classification will first be used to train the standalone-contextual hate binary classification model (SCH-BCM), which to be able to distinguish between two different types of hate speech. The HNG-BCM in Subtask 1 will then perform the first classification on the test dataset. Then the perceived hate speech in the test dataset is sent to the SCH-BCM for a second classification to determine if it is contextual or standalone hate. In summary, HNG-BCM was used to determine if the input data was hate speech, and SCH-BCM focused on further differentiation of hate speech.

For the two subtasks, Figure 1 gives the corresponding flowcharts for each. 4.4.

Experimental setting

For the two subtasks of HASOC 2022, our experiments used Hugging Face's transformer [18] library to fine-tune all pre-trained models. Those models were mostly configured with the same hyperparameters. The batch size was set to 32, and the maximum sequence length was 512. AdamW optimizer [19] with a linear learning rate scheduler and an initial learning rate of 2e-5 is used for training. For HNG-BCM and GER-BCM, we trained for 20 epochs, while for SCH-BCM, it is 40. Models will be evaluated using macro f1 after training in each epoch. At the end of the training, the model with the highest score will be retained and used as the final model.

Subtask 1: Binary Subtask 2: Multiclass Hinglish

German HNG-BCM

GER-BCM input Softmax output

input HNG-BCM NOT Softmax output

5. Results

We evaluate the model using the validation set, and macro f1 will be used as the evaluation metric. Table 2 shows the scores achieved by each model on the validation set.

In HASOC 2022, we ran three commits in total. The organizers used macro f1 to evaluate the predictions for each subtask. The final scoring metrics obtained by our model in the official test set can be found in Table 3.

6. Conclusion

This paper briefly describes the results of our team's work on the HASOC 2022 shared task. Multiple pre-trained models have been used and processed in combination to solve the problem of hate speech detection based on the context of multilingual mixed tweets, and our team achieved competitive results for two subtasks. We note that the models perform poorly on the multiclassification task, likely to remain due to data imbalance. Our next work direction will consider trying to eliminate the data imbalance problem by adding training samples using multiple translations.

7. Acknowledgements

This work is supported by the Natural Science Foundation of Guangdong Province, China (No. 2022A1515011544). 8. References [13] Z. M. Farooqi, S. Ghosh, R. R. Shah, Leveraging transformers for hate speech detection in conversational code-mixed tweets, in: FIRE (Working Notes), CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 63–74. [14] S. Satapara, P. Majumder, T. Mandl, S. Modha, H. Madhu, T. Ranasinghe, M. Zampieri, K. North, D. Premasiri, Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages, in: FIRE 2022: Forum for Information Retrieval Evaluation, Virtual Event, 9th-13th December 2022, ACM, 2022. [15] S. Modha, T. Mandl, P. Majumder, S. Satapara, T. Patel, H. Madhu, Overview of the HASOC Subtrack at FIRE 2022: Identification of Conversational Hate-Speech in Hindi-English CodeMixed and German Language , in: Working Notes of FIRE 2022 - Forum for Information Retrieval Evaluation, CEUR, 2022. [16] M. Das, S. Banerjee, A. Mukherjee, Data bootstrapping approaches to improve low resource abusive language detection for indic languages, in: HT, ACM, 2022, pp. 32–42. [17] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: NAACL-HLT, Association for Computational Linguistics, 2019, pp. 4171–4186. [18] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M.

Funtowicz, et al., Huggingface’s transformers: State-of-the-art natural language processing, CoRR abs/1910.03771, 2019. URL: http://arxiv.org/abs/1910.03771. arXiv:1910.03771. [19] I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, in: ICLR, 2019. URL: https://openreview.net/forum?id=Bkg6RiCqY7.

[1]

Jaki , T. De Smedt,

Gwóźdź ,

Panchal ,

Rossa , G. De Pauw, Online hatred of women in the incels. me forum: Linguistic analysis and automatic detection , Journal of Language Aggression and Conflict , 2019 , vol. 7 , pp. 240 - 268 .

[2]

Waseem ,

Hovy , Hateful symbols or hateful people? predictive features for hate speech detection on twitter , in: Proceedings of the NAACL student research workshop , pp. 88 - 93 , 2016 .

[3]

Gambäck ,

U. K.

Sikdar , Using convolutional neural networks to classify hate-speech , in: Proceedings of the first workshop on abusive language online , 2017 , pp. 85 - 90 .

[4]

F. E.

Ayo ,

Folorunso ,

F. T.

Ibharalu ,

I. A.

Osinuga ,

Abayomi-Alli , A probabilistic clustering model for hate speech classification in twitter , Expert Systems with Applications , 2021 , vol. 173 , pp. 114762 .

[5]

Sreelakshmi ,

Premjith ,

Soman , Detection of hate speech text in hindi-english code-mixed data , Procedia Computer Science , 2020 , vol. 171 , pp. 737 - 744 .

[6]

Santosh ,

Aravind , Hate speech detection in hindi-english code-mixed social media text , in: Proceedings of the ACM India joint international conference on data science and management of data , 2019 , pp. 310 - 313 .

[7]

Sharma ,

Kabra ,

Jain , Ceasing hate with moh: Hate speech detection in hindi-english code-switched language , Information Processing & Management , 2022 , vol. 59 , pp. 102760 .

[8]

Mandl ,

Modha ,

Majumder ,

Patel ,

Dave ,

Mandlia ,

Patel , Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages , in: Proceedings of the 11th Forum for Information Retrieval Evaluation , 2019 , pp. 14 - 17 .

[9]

Mandl ,

Modha , A. Kumar

B. R.

Chakravarthi , Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german , in: Forum for Information Retrieval Evaluation , 2020 , pp. 29 - 32 .

[10]

Modha ,

Mandl ,

G. K.

Shahi ,

Madhu ,

Satapara ,

Ranasinghe , M. Zampieri, Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content identification in english and indo-aryan languages and conversational hate speech , in: Forum for Information Retrieval Evaluation , 2021 , pp. 1 - 3 .

[11]

Banerjee ,

Sarkar ,

Agrawal ,

Saha , M. Das , Exploring transformer based models to identify hate speech and offensive content in english and indo-aryan languages , in: FIRE (Working Notes) , CEUR Workshop Proceedings, CEUR-WS.org , 2021 , pp. 32 - 43 .

[12]

Bhatia ,

T. S.

Bhotia ,

Agarwal ,

Ramesh ,

Gupta ,

Shridhar ,

Laumann ,

Dash , One to rule them all: Towards joint indic language hate speech detection , in: FIRE (Working Notes) , CEUR Workshop Proceedings, CEUR-WS.org , 2021 , pp. 419 - 431 .