Classification of Hate, Offensive and Profane content from Tweets using an Ensemble of Deep Contextualized and Domain Specific Representations Basavraj Chinagundi1 , Muskaan Singh2 , Tirthankar Ghosal2 , Prashant Singh Rana1 and Guneet Singh Kohli1 1 Thapar Institute of Engineering and Technology, India 2 Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Czech Republic Abstract The explosive growth of social media has also resulted in unfortunate emergence of hate, offensive, and profane content on the web. A certain conversational thread can contain hate, offensive, and profane content, which is not apparent from a standalone or single tweet or replies but can be identified if given the context of the parent content. Such social media content is spread in many different languages, including code-mixed languages like hinglish (English code-mixed with Hindi). So it becomes a huge responsibility for the social media sites to identify such hate content before it gets disseminated to the general population, which may trigger havoc. The hate speech and offensive content identification track (HASOC)[1] in FIRE 2021 English Subtask A track provides a forum and a data challenge for multilingual research on the identification of such problematic content. In this paper, we describe our submission for the above track. Our proposed approach uses a transformer-based embedding with HateBERT and achieves the Macro F1 score of 79% on the test data, which is 3.96% behind the best-performing system. We make our system run available at https://github.com/basavraj-chinagundi/HASOC_2021 Keywords hate Speech, Text Classification, Profane Content, HateBERT 1. Introduction Social media sites like Twitter and Facebook, being user-friendly and a free source , provide opportunities for people to air their voices. People, irrespective of age group, use these sites to share every moment of their lives, making these sites flooded with data. Apart from these commendable features of social media, they also have downsides as well. Due to the lack of restrictions set by these sites for their users to express their views as they like, anyone can make adverse and unrealistic comments in abusive language against anybody with an ulterior motive to tarnish one’s image and status in society. A conversational thread can also contain hate content, which is not apparent just from a single comment or the reply to a comment, but can be identified if given the context of the parent content. Furthermore, the contents on such social media are spread in so many different languages, including code-mixed languages such as hinglish. So it becomes a huge responsibility for these sites to identify such hate content before it disseminates to the masses. The best performing model in our study is based on Forum for Information Retrieval Evaluation, December 13-17, 2021, India © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Transformer contextual embedding and HateBERT architecture. When compared to traditional and ensembled machine learning models, the presented solution enhances accuracy by 6-9% on average. The HateBERT based model achieves a competitive f1 score of 0.7909, demonstrating potential for further improvement in performance 2. Related Work Most of the previous works in the field use surface features, word embedding features, lexical resources, meta-information, linguistic study, cross-domain information, dealing with biases and multi-task learning. The problem of classifying any sentence as hate is challenging. There might be cases when a sentence containing slang might be classified as hate, because of the words having different meanings in a different context. It affects the right to freedom of speech with the kind of words being used on social media. [2] used SVM with syntactic and semantic information of word-n-grams. [3] presents logistic regression model performance with feeding TF-IDF values and unigram, bigrams, trigrams featured weights achieve 90% precision with 61% correctly predicting hate class. [4] classified ontological classes of harmful speech based on different parameters such as degree of content, intent and its effect on social media. [5] annotated a dataset of 16K tweets using race theory releases publicly. They also did performance analysis on geographic, word length distribution does not have a significant impact while gender information combined with char-n-grams shows some significant improvement. [6] uses four features (such as linguistic, syntactic, distributional, n-gram) to differentiate between abusive and clean features in news and financial data. [7] identified racist and radicalized intent on Tumblr microblogging website using semantic, sentiment and linguistic features with cascaded ensemble learning classifier. [8] provided an annotated corpora of 80K tweets categorized into 8 labels, for studying different types of abusive behaviour. [9] uses features such as sentiment, semantic, unigrams and pattern-based to classify 2010 sentences. [10] released a data set of 2435 tweets on refugees and Muslims and a new novel approach using CNN-GRU architecture. Their approach shows promising results for 6 out of 7 data sets. outperforming other state-of-art by 13 F1 scores. [11] applied bag-of-words to learn a classifier for the labels racist and non-racist with 76% accuracy. [12] combines LSTM model and neural-based GBDT word embedding on dataset [5]. [13] combined char-CNN and word-CNN by formulating a hybrid CNN which performed well than classic methods like logistic regression and SVM on a data set of 16k tweets by [5]. It firstly detected the abusive language and then classified it into specific types of abuse. Further [14] also use CNN with random vectors, word vectors based on semantic information, word vectors combined with character 4-grams. It also presents a comparative performance analysis. 3. Task Description A conversational thread contains hate, offensive, and profane content, which is not apparent from a standalone or single tweet or comment or the reply to a comment, but can be identified if given the context of the parent content is known. In reference to fig:example the screenshot from Twitter describes the problem at hand effectively. The parent/source tweet, which was posted at 2:30 am on May 11th, expresses Hate and profanity towards Muslim countries regarding the Figure 1: Example of Contextual misinterpretation[15] controversy happening during the recent Israel-Palestine conflict. The 2 comments on the tweet have written ”Amine”, which means trustworthy or honest in Arabic. If the 2 comments were to be analyzed for hate or offensive speech without the context of the parent tweet, they would not be classified as hate or offensive content. But if we take the context of the conversation, then we can say that the comments support the hate expressed in the parent tweet. So those comments are labelled as hate/offensive/profane. The English sub-task A [16] focused on the binary classification of such conversational tweets with tree-structured data into: • (NOT) Non hate-Offensive This tweet, comment, or reply does not contain any hate speech, profane, offensive content. • (HOF) hate and Offensive This tweet, comment, or reply contains hate, offensive, and profane content in itself or supports hate expressed in the parent tweet. Another such example with code mixed text. The Source Tweet: “Modi Ji COVID situation ko solve karne ke liye ideas maang rahe the. Mera idea hai resignation dedo please. ” • Translation : Modi ji (PM of India) was asking for ideas to solve the covid situation of India. My idea to him is to resign. • The Comment: Doctors aur Scientists se manga hai. Chutiyo se nahi. Baith niche. [HOF] • Translation: They have asked Doctors and Scientists. Not fuckers. Sit down. [HOF] • The reply: You totally nailed it, can’t stop laughing. [HOF] The reply has a positive sentiment. But it is positive in favour of the hate expressed towards the author of the source tweet in the comment. Hence, it is supporting the hate expressed in the comment. Hence, it is also hate speech. This is the type of problem we’re aiming to solve via this shared task. 4. Dataset Description We experiment with a collection of diverse datasets comprising of foul and offensive tweets, comments acquired from various sources. The dataset[17] used for training consists a total of 76601 texts which are either hate speech and offensive (40823) or normal (35778). We have collected these samples from namely 6 sources like: 1. HSOL[18]: HSOL is a dataset for hate speech identification that includes a hate speech lexicon including words and phrases recognised as hate speech by internet users and collected by hatebase.org. Using the Twitter API, they searched for tweets containing phrases from the lexicon, yielding a sample of tweets from 33,458 Twitter users. They retrieved the timeline for each user, yielding a collection of 85.4 million tweets. From this dataset, they selected a random sample of 25k tweets containing lexical words and had them manually coded by CrowdFlower (CF) employees. Workers were asked to categorise each tweet into one of three categories: hate speech, offensive but not hate speech, or neither offensive nor hate speech. 2. OLID[19]:OLID is a hierarchical dataset to identify the type and the target of offensive texts in social media. The dataset was compiled via Twitter and is freely accessible to the public. There are 14,100 tweets in all, with 13,240 in the training set and 860 in the test set. There are three degrees of labelling for each tweet: (A) Offensive/Not-Offensive, (B) Targeted-Insult/Untargeted, and (C) Individual/Group/Other. If a tweet is offensive, it might have a target or no target. If it is offensive to a specified target, the target might be an individual, a group, or any other thing. This dataset is utilised in the OffensEval-2019 competition at SemEval-2019. 3. hatespeech [20]: Dataset of hate speech annotated at the sentence level from Internet forum postings in English. Stormfront, a prominent online community of white national- ists, is where the source forum can be found. A total of 10,568 sentences were taken from Storm front and labelled as hate speech or not. 4. TRAC[21]: The data set consists of 15,000 aggression-annotated Facebook Posts and Comments that include labels for three-way categorization of text data into ‘Overtly Aggressive, ’ ‘Covertly Aggressive, ’ and ‘Non-aggressive. ’ 5. ETHOS[22]: ETHOS is a data set for detecting hate specks. It is made up of YouTube and Reddit comments that have been verified using a crowd sourcing tool. It is divided into two subsets: one for binary classification and one for multi-label classification. The one used for our experiment is the binary subset. The former has 998 comments, whereas the latter has 433 comments with fine-grained hate-speech annotations. 6. HASOC[23]: The dataset focuses on hate speech and offensive language detection in English. The data set is classified into two classes, namely: hate and Offensive (HOF) consisting 5051 tweets and Non- hate and offensive (NOT) consisting 5798 tweets respec- tively. 5. Methodology In the methodology we describe the pre-processing and explain different baselines along with submitted experiments for classifying hate content as shown in Figure 2. 5.1. Pre-processing We first went with lower casing each tweet/comment in the data set. Secondly, hashtags are very critical while retrieving sentiment of a text, therefore we preprocess the hashtags using a tailored technique. We start by creating a data frame of all hashtags in a column and their counts. After that we remove numbers and segment multiple words using hash fix function which basically splits the word into segments using the word segment library. Finally we create a dictionary of the hashtags and their clean strings. For example if we have hashtags consisting of multiple words such as #fuckdick it will split the above token into fuck and dick respectively, enhancing the ability to retrieve significant words which are critical for classification of the text as a negative sentiment. We further remove other irrelevant parts of the texts such as usernames, some special characters, retweet tags etc. 5.2. Model Description The next step is extracting features from the text for which we use TF-IDF vectorizer to transform text into a meaningful representation of numbers which is then used to fit machine algorithm such as NB, LR, KNN, SVM, DT, RF, Bagging, AdaBoost and voting in Table 1 for classifying our text as hate speech or not. We also experiment with another word embedding technique GloVe (840B tokens, 2.2M vocab, cased, 300 dimension vectors) which is an unsupervised learning algorithm for obtaining vector representations for words. We test out multiple machine learning algorithms and also use ensemble learning in order to produce one optimal predictive model. Now to produce even better results, we try out transformer based pre-trained models: 1. Ernie 2.0 [24] 2. Twitter Roberta Base Offensive [25] 3. HateBERT [20] The deep learning based models have their own embeddings which were used to extract features from the text. In the final step we fine tune these three models on our combined dataset and boost the results for the classification of text as hateful and offensive. Figure 2: Our Proposed Architecture 6. Experiment and Results After extracting features with TF-IDF We first use a logistic regression with L2 regularization as it disperses the error terms in all the weights and leads to more accurate customized final models. We then test a variety of models that have been used in prior work: logistic regres- sion, naive Bayes, decision trees, random forests, k-nearest neighbors (KNN), and linear SVMs. We then try bagging method using decision tree classifier with parameters set as max_sam- ples=0.5, max_features=1.0, n_estimators=10. We also check AdaBoostClassifier using decision trees with parameters min_samples_split=10, max_depth=4, n_estimators=10, learning_rate=0.6. We tested each model using 5-fold cross validation, holding out 10 percent of the sample for evaluation to help prevent over-fitting. After using a grid-search to iterate over the models and parameters we find that the Logistic regression, naive bayes, random forest and Linear SVM tend to perform significantly better than other models. So by ensembling these we make another model using voting classifier. When comparing all these models we see that logistic regression with tfidf vector representation performs the best having 0.77 accuracy, macro avg f1 score of 0.75 and weighted avg f1 score of 0.76 respectively. We then experiment using GloVe represtation and ensembling ML algorithms namely naive bayes, logistic regression and multilayer perceptron and find out that it doesn’t necessarily boost the performance of our model achieving accuracy of 0.72, macro avg f1 score of 0.71 and weighted avg f1 score of 0.72. Thus we move onto our final set of experimentations. We use transformer based pretrained models as a transformer is to able to parallely process the words in the sentences and get contexualized embeddings. This parallel processing is not possible in LSTMs or RNNs or GRUs as they take words of the input sentence as input one by one. We ran all three pretrained models for 5 epochs by fine tuning it with hyperparameters having batch size 16 and Adam optimizer with learning rate 1e-5 , eps=1e-8. Ernie 2.0 improved the performance in comparison to the previous experiments we ran using word embeddings and machine learning algorithms, achieving 0.80 accuracy, macro avg f1 score of 0.78 and weighted avg f1 score of 0.80 respectively. TwitterRobertaBaseOffensive which is a pretrained model trained on 58M tweets and finetuned for offensive language identification with the TweetEval benchmark further increased the performance attaining 0.81 accuracy, macro avg f1 score of 0.79 and weighted avg f1 score of 0.81. Finally we test another pretrained model HateBERT which was was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have collected and made available to the public. It performed the best optimally amongst all the pretrained models achieving same accuracy and f1 scores as TwitterRobertaBaseOffensive but having slightly better performance in recall metrics for classifying text as HOF and NOT hate respectively, overall leading to a much more balanced model. The Google colab’s Tesla P100-pcie-16GB with 8 core CPU and 32GB RAM was used for the experimental setup. 7. Conclusion In this paper, we present our submission to classify hate content from the tweets and comments. The recent trend of hateful speech has increased and has posed a lot of challenge in discriminating hate speech against freedom of speech. One post can mean differently in different context as there is no universally accepted definition of hate speech. There are different benchmark depending upon demography, social influence and cultural factors. We propose, a deep learning model based on Transformer contextual embedding and HateBERT architecture. We pre-processed the tweet from HASOC 2021 data set, extracted features embedding and trained our system to classify into hate speech or not with 79% macro F1 score. The work compiled showcases the scope of HateBERT in being employed for further experimentation and being optimised for better performances by focusing on newer embedding combinations and ensemble approaches. Table 1 TF-IDF embeddings based model performance comparison MODELS precision recall macro f1 weighted f1 accuracy NB 0.72 0.72 0.72 0.74 0.74 LR 0.75 0.74 0.75 0.76 0.77 KNN 0.59 0.56 0.46 0.43 0.48 SVM 0.73 0.72 0.72 0.74 0.74 DT 0.67 0.66 0.66 0.68 0.69 RF 0.76 0.60 0.58 0.63 0.69 Bagging 0.71 0.70 0.70 0.72 0.72 AdaBoost 0.64 0.64 0.63 0.64 0.63 Voting 0.74 0.73 0.74 0.75 0.75 Table 2 GloVe embeddings based ensemble model performance MODEL precision recall macro f1 weighted f1 accuracy glove+Voting 0.71 0.72 0.71 0.72 0.72 Table 3 Transformer based pretrained models comparison Submission MODELS precision recall macro f1 weighted f1 accuracy Baseline ERNIE 2.0 0.80 0.77 0.78 0.80 0.80 Baseline TwitterRobertaOff 0.81 0.78 0.79 0.81 0.81 Submitted hateBERT 0.81 0.79 0.79 0.81 0.81 Table 4 Comparison of our submission with the other submission in the HASOC@FIRE2021[26] English Subtask- A Shared task[1] with team giniUS Maximum F1 across all submission 0.83 Minimum F1 across all submission 0.50 Average F1 across all submission 0.75 Our submission 0.79 References [1] S. Modha, T. Mandl, G. Shahi, H. Madhu, S. Satapara, T. Ranasinghe, M. Zampieri, Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content identification in english and indo-aryan languages and conversational hate speech, 2021. [2] H. Chen, S. McKeever, S. J. Delany, Abusive text detection using neural networks., in: Artificial Intelligence and Cognitive Science (AICS), 2017, pp. 258–260. [3] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the problem of offensive language, in: Proceedings of the International AAAI Conference on Web and Social Media, volume 11, 2017. [4] S. Sharma, S. Agrawal, M. Shrivastava, Degree based classification of harmful speech using twitter data, arXiv preprint arXiv:1806.04197 (2018). [5] Z. Waseem, D. Hovy, Hateful symbols or hateful people? predictive features for hate speech detection on twitter, in: Proceedings of the NAACL student research workshop, 2016, pp. 88–93. [6] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, Y. Chang, Abusive language detection in online user content, in: Proceedings of the 25th international conference on world wide web, 2016, pp. 145–153. [7] S. Agarwal, A. Sureka, Characterizing linguistic attributes for automatic classification of intent based racist/radicalized posts on tumblr micro-blogging website, arXiv preprint arXiv:1701.04931 (2017). [8] A. M. Founta, C. Djouvas, D. Chatzakou, I. Leontiadis, J. Blackburn, G. Stringhini, A. Vakali, M. Sirivianos, N. Kourtellis, Large scale crowdsourcing and characterization of twitter abusive behavior, in: Twelfth International AAAI Conference on Web and Social Media, 2018. [9] H. Watanabe, M. Bouazizi, T. Ohtsuki, Hate speech on twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection, IEEE access 6 (2018) 13825–13835. [10] Z. Zhang, D. Robinson, J. Tepper, Detecting hate speech on twitter using a convolution-gru based deep neural network, in: European semantic web conference, Springer, 2018, pp. 745–760. [11] I. Kwok, Y. Wang, Locate the hate: Detecting tweets against blacks, in: Twenty-seventh AAAI conference on artificial intelligence, 2013. [12] P. Badjatiya, S. Gupta, M. Gupta, V. Varma, Deep learning for hate speech detection in tweets, in: Proceedings of the 26th international conference on World Wide Web companion, 2017, pp. 759–760. [13] J. H. Park, P. Fung, One-step and two-step classification for abusive language detection on twitter, arXiv preprint arXiv:1706.01206 (2017). [14] B. Gambäck, U. K. Sikdar, Using convolutional neural networks to classify hate-speech, in: Proceedings of the first workshop on abusive language online, 2017, pp. 85–90. [15] Hasoc, https://hasocfire.github.io/hasoc/2021/index.html, 2021. Accessed: 2021-11-27. [16] T. Mandl, S. Modha, G. Shahi, H. Madhu, S. Satapara, P. Majumder, J. Schäfer, T. Ranasinghe, M. Zampieri, D. Nandini, et al., Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content identification in english and indo-aryan languages, Working Notes of FIRE (2021). [17] B. Dave, S. Bhat, P. Majumder, Irnlp_daiict@ lt-edi-eacl2021: Hope speech detection in code mixed text using tf-idf char n-grams and muril, in: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion, 2021, pp. 114–117. [18] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the problem of offensive language, in: Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM ’17, 2017, pp. 512–515. [19] S. Rosenthal, P. Atanasova, G. Karadzhov, M. Zampieri, P. Nakov, A large-scale semi- supervised dataset for offensive language identification, arXiv preprint arXiv:2004.14454 (2020). [20] T. Caselli, V. Basile, J. Mitrović, M. Granitzer, Hatebert: Retraining bert for abusive language detection in english, arXiv preprint arXiv:2010.12472 (2020). [21] Trac, https://sites.google.com/view/trac1/home, 2021. Accessed: 2021-11-27. [22] I. Mollas, Z. Chrysopoulou, S. Karlos, G. Tsoumakas, Ethos: an online hate speech detection dataset, arXiv preprint arXiv:2006.08328 (2020). [23] P. Alonso, R. Saini, G. Kovács, Hate speech detection using transformer ensembles on the hasoc dataset, in: International Conference on Speech and Computer, Springer, 2020, pp. 13–21. [24] Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian, H. Wu, H. Wang, Ernie 2.0: A continual pre- training framework for language understanding, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 2020, pp. 8968–8975. [25] F. Barbieri, J. Camacho-Collados, L. Neves, L. Espinosa-Anke, Tweeteval: Unified bench- mark and comparative evaluation for tweet classification, arXiv preprint arXiv:2010.12421 (2020). [26] S. Satapara, S. Modha, T. Mandl, H. Madhu, P. Majumder, Overview of the hasoc subtrack at fire 2021: Conversational hate speech detection in code-mixed language, Working Notes of FIRE (2021).