Hate Speech and Offensive Content Identification with Graph Convolutional Networks Necva Bölücü1 , Pelin Canbay2 1 Department of Computer Engineering, Hacettepe University, Ankara, Turkey 2 Department of Computer Engineering, Sutcu Imam University, Kahramanmaras, Turkey Abstract Social media is a widespread platform and has a huge impact on society. There is a massive amount of data that plays an important role in expressing ideas, thoughts, emotions, etc. Identifying hate speech and offensive content on social media has gained attention recently. This is also the goal of the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) 2021 Challenge in both English and Hindi languages. In this paper, we describe the system based on Graph Convolutional Networks (GCN) submitted by our team HUNLP for Subtask 1A and 1B. Our system has achieved a Macro F1-score of 82.15% for English Subtask 1A and ranked 2𝑛𝑑 in the leader-board. Moreover, our model has achieved 71.94% and 78.95% for Hindi and Marathi Subtask 1A on the official test set, respectively. Also, we have achieved Macro F1-score of 62.96% for English Subtask 1B. Keywords Social Media, Hate Speech, Graph Convolutional Network 1. Introduction Recently, social media platforms such as Facebook, Twitter, and Instagram have gained atten- tion, and users are creating various ways to express their opinions and thoughts. The use of social media has led to a huge volume of data with hateful and offensive content. Recent growing interest in Natural Language Processing (NLP) for identifying abusive and offensive content such as identification of abusive content [1, 2, 3], cyberbullying [4, 5, 6], hate speech [7, 8, 9], and offensive content [10, 11], have been observed. The Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) [12] proposed identification of hate speech and offensive content task focusing on Indo-European languages in English and Hindi. The aim is to develop models to for identifying hate and of- fensive content on social media. In this paper, we as HUNLP team have taken up the task and proposed a deep learning model based on Graph Convolutional Network (GCN) to identify hate and offensive content collected from Twitter by the HASOC data [12]. Previously, deep learning models such as LSTM [13], CNN [14], and pretrained models BERT [15], DistilBERT [16] have been applied for this task. The disadvantage of these models is ignoring word co-occurence in a corpus which carries non- FIRE 2021: Forum for Information Retrieval Evaluation, December 13-17, 2021, India £ necva@cs.hacettepe.edu.tr (N. Bölücü); pelincanbay@ksu.edu.tr (P. Canbay) DZ 0000-0001-8121-3048 (N. Bölücü); 0000-0002-8067-3365 (P. Canbay) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Table 1 Statistics of HASOC 2021 Subtask 1 train dataset Subtask 1A Subtask 1B Language Total # of Instances NOT HOF HATE OFFN PRFN NONE English 3843 1342 2501 683 622 1196 1342 Hindi 4594 3161 1433 566 654 213 3161 Marathi 1874 1205 669 - - - consecutive and long-distance semantics. To alleviate the disadvantage, GCN is proposed that contains rich relational structure and preserve global structure information in a graph. [17, 14] The rest of this paper is organized as follows. Section 2 describes the task with the data on which the task was performed. Section 3 presents our method with preprocessing, and Section 4 presents the gives with details of our experimental setup. Finally, Section 5 summarizes our work. 2. Data In this section, we briefly describe the tasks with the data proposed by the task organizers to train the model for the hate speech identification task. The given dataset used on HASOC1 in 2 languages, namely English and Hindi, consists of two Subtasks with a separate dataset for both Subtasks. • Subtask 1 [18, 19]: is a classification problem consisting of two downstream tasks: Sub- task 1A is a binary classification task to indicate whether the tweet is Hate and Offensive (HOF) or Non Hate-Offensive (NOT), and Subtask 1B is a three-classes classification task to classify tweets into three classes: HATE, Offensive (OFFN) or Profane (PRFN). • Subtask 2 [20]: is the identification of conversational hate-speech in code-mixed lan- guages. Since we have dealt with Subtask 1, we give the details of the dataset for this subtask. The train dataset is provided in three different files for English, Hindi, and Marathi. The English and Hindi dataset files contain the fields _id, text, task_1 and task_2, where task_1 is the label of tweet post for Subtask 1A and task_2 is the label of tweet post for Subtask 1B. The Marathi dataset contains only the text_id, text, and task_1 fields because Marathi is not part of Subtask 1B. The training data statistics for Subtask 1 are presented in Table 1. 3. Methodology The details of the preprocessing and the proposed model for Subtask 1 are given in the subsec- tions. 1 https://hasocfire.github.io/hasoc/2021/dataset.html Last visited: 14-10-2021. Table 2 Output of ekphrasis library as pre-processing of tweets Language State Tweet Original Oh they love this lol https://t.co/4kCudKSAk5k English Preprocessed oh they love this lol Original @AskAnshul आसमानी िकताब के नाजायज औलाद है।कूल Hindi Preprocessed आसमानी िकताब के नाजायज औलाद है।कूल Original @मराठ्यांनो कळालं का आता कोण तुमचा िवचार करतो ते???◌ूल Marathi Preprocessed मराठ्यांनो कळालं का आता कोण तुमचा िवचार करतो? 3.1. Preprocessing Since the dataset we use consists of tweets in English and Hindi languages, we need to normal- ize the tweets before converting them into word embeddings. Since the provided corpus is collected from Twitter, the tweets contain unstructured infor- mation like abbreviations, Twitter handles, punctuation marks, special characters, and more. Ekphrasis2 library [21] is a tool designed to normalize text from social networks. It improves text through tokenization, normalization, segmentation, and spell correction by using word statistics extracted from a 2 corpus (English Wikipedia, Twitter - 330 million English tweets). The ekphrasis is used for preprocessing the corpus to improve the data quality and obtain the relevant information. The preprocessing steps included in ekphrasis are: • Normalization: To convert tweets into machine-understandable text, 8 normalizations are applied to the data: Normalizations of date, time, email, URL, currency, number, phone number, and username. • Annotations for emotions and emotion-causing features: Social media users tend to express their emotions by using different styles. The normalization step includes nor- malization of hashtag, capitalization (all caps), elongated words, repeated characters, em- phasis (included in asterisks), and censored words (censored abusive word). • Contractions unpacking: Due to the character limits on Twitter, users tend to shorten text. Unpacking contractions is important to normalize the tweets (can’t → can not). The original and the preprocessed tweets by the ekphrasis library are given in Table 2. 3.2. Model Architecture Graph Neural Networks are proposed as a paradigm-shifting method for solving NLP [22, 23] and Computer Vision [24, 25] tasks. Graph Convolutional Network (GCN) is a version of Graph Neural Networks that includes an additional convolutional layer. The text classification task using GCN is the first study proposed by Yao et al. [14] in which a document on a graph is represented by GCN and the embedding vector of nodes is induced 2 https://github.com/cbaziotis/ekphrasis. Last visited: 14-10-2021. Figure 1: Graph structure based on the properties of their neighborhoods. We adopt the study of Yao et al. [14] for the shared task. To convert the data into graph format, we follow the method of Yao et al. [14]. The graph 𝐺 = (𝑁 , 𝐸, 𝑊 ), where 𝑁 is the set of nodes, 𝐸 is the set of edges and 𝑊 ∶ 𝐸 → 𝑅 (𝑅 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑟𝑒𝑎𝑙𝑠) is the function that assigns a weight each edge of the graph 𝐺. The details of the graph 𝐺 is given below: • Nodes (N): Text GCN build a graph with word and tweet nodes. The number of nodes is a combination of word nodes (the number of unique words (vocabulary size)) and tweet nodes (number of tweets in the train file), defined as |𝑉 | • Edges (E): To create edges between words, a sliding window is used. The intuition behind the sliding window corresponds to the Convolutional Neural Network filter. Each window acts as a a convolution filter of size (1, 𝑛). • Weights (W): 𝐴 is an adjacency matrix of the graph 𝐺 and its degree matrix is 𝐷, where 𝐷𝑖𝑖 = ∑𝑗 𝐴𝑖𝑗 . We use term frequency-inverse document frequency (TF-IDF) and point- wise mutual information (PMI) to form edges between word and tweet nodes and two word nodes, respectively. While PMI maps the word co-occurrence information, TF- IDF metric is statistical measure that evaluates how relevant a word is to a tweet in a collection of tweets. The graph structure representation can be found in Figure 1. The output of a one-layer GCN layer is computed as follows: 𝐿(1) = 𝜌(𝐴̃ × 𝑊0 ) (1) where 𝜌 is an activation function used in the model, 𝐴̃ is the normalized symmetric adjacency matrix, and 𝑊0 is a weight matrix. In the proposed model, we apply a simple two-layer GCN to the graph and feed the output of the node of the second layer into softmax classifier: ̃ 𝑍 = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥(𝐴𝑅𝑒𝐿𝑈 (𝐴̃ × 𝑊0 )𝑊1 ) (2) Figure 2: The general architecture of the proposed model The Loss is calculated by using the cross-entropy for the task. The architecture of the pro- posed model is given in Figure 2. 4. Experiments & Results In this section, we present the experimental settings and the obtained results on the test dataset in all languages for Subtask 1A and in English for Subtask 1B. Table 3 Results on Test Dataset Task Language Macro F1 Rank Obtained 1st Ranked Team / Test Macro F1 English 0.8215 2 NLP-CIC / 0.8305 Subtask 1A Hindi 0.7194 30 t1 / 0.7825 Marathi 0.7895 20 WLV-RIT / 0.9144 Subtask 1B English 0.6296 9 NLP-CIC / 0.6657 Settings We split the train dataset into 80% train and 20% evaluation data to find the optimum hyperparameters. The model is built using Adam optimization [26]. The model was trained with parameters epochs = 200, learning rate = 0.02, dropout rate = 0.1, 𝐿2 loss weight = 0 and consecutive epoch = 50. We used BERT [27], RoBERTa [28] and GloVe [29] word embeddings. Since the GloVe embeddings were trained specifically for Twitter (GloVe Twitter 3 ), we chose to 3 https://nlp.stanford.edu/projects/glove/ Last visited: 14-10-2021. use the GloVe embeddings in the model for English. Since we couldn’t find word embeddings trained for Twitter for Hindi and Marathi, we used multilingual BERT and RoBERTa for Hindi and Marathi and got the best results with BERT ( BERT multilingual base model (cased)4 ). Results The best models obtained from the evaluation data were submitted by HASOC-2021 organizers in the competition for final evaluation. Table 3 shows the macro F1 score obtained by our best model with the names of 1𝑠𝑡 ranker teams with their F1 macro scores for Subtask 1. The detailed results are also given in Table 4. Table 4 Detailed Results on Test Dataset Task Language Macro F1 Macro Precision Macro Recall Accuracy English 0.8215 0.8844 0.7669 79.24% Subtask 1A Hindi 0.7194 0.7258 0.7147 75.78% Marathi 0.7895 0.7910 0.7881 81.44% Subtask 1B English 0.6296 0.6305 0.6362 66.59% We assume that there are several reasons for the lower results for Hindi and Marathi. The first reason is word embeddings that are not trained on Twitter. It is clear that, the multilingual embeddings are not suitable for Twitter dataset in Hindi and Marathi. Another reason is the ekphrasis library that is proposed for English. For consistency, we have used it for Hindi and Marathi. However, the results show that it is not a good solution to normalize Hindi and Marathi dataset with ekphrasis. 5. Conclusion In this paper, we presented the model of a graph convolutional network model on Subtask 1 of the shared task of hate speech and offensive content identification in English and Hindi lan- guages. The results of the experimental study showed that using GCN model is very effective on hate speech and offensive content identification task. Compared to previous approaches, our model based on GCN is comparatively different for the shared task. We achieved rank 2, 30 and 20 for English, Hindi and Marathi in Subtask 1A and rank 9 for English in Subtask 1B respectively. In future work, we will further extend the experiments by combining datasets for the same Subtask to perform multilingual experiments. References [1] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, Y. Chang, Abusive language detection in online user content, in: Proceedings of the 25th international conference on world wide web, 2016, pp. 145–153. [2] A. M. Founta, C. Djouvas, D. Chatzakou, I. Leontiadis, J. Blackburn, G. Stringhini, A. Vakali, M. Sirivianos, N. Kourtellis, Large scale crowdsourcing and characterization 4 https://huggingface.co/bert-base-multilingual-cased. Last visited: 14-10-2021. of twitter abusive behavior, in: Twelfth International AAAI Conference on Web and Social Media, 2018. [3] P. Mishra, M. Del Tredici, H. Yannakoudakis, E. Shutova, Abusive language detection with graph convolutional networks, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 2145–2150. [4] C. Chelmis, D.-S. Zois, M. Yao, Mining patterns of cyberbullying on twitter, in: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, 2017, pp. 126–133. [5] M. Yao, C. Chelmis, D.-S. Zois, Cyberbullying detection on instagram with optimal online feature selection, in: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, 2018, pp. 401–408. [6] L. Cheng, J. Li, Y. N. Silva, D. L. Hall, H. Liu, Xbully: Cyberbullying detection within a multi-modal context, in: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 2019, pp. 339–347. [7] S. Malmasi, M. Zampieri, Detecting hate speech in social media, in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, 2017, pp. 467–472. [8] B. Mathew, A. Illendula, P. Saha, S. Sarkar, P. Goyal, A. Mukherjee, Temporal effects of unmoderated hate speech in gab, arXiv preprint arXiv:1909.10966 (2019). [9] S. S. Aluru, B. Mathew, P. Saha, A. Mukherjee, Deep learning models for multilingual hate speech detection, arXiv preprint arXiv:2004.06465 (2020). [10] M. Wiegand, M. Siegel, J. Ruppenhofer, Overview of the germeval 2018 shared task on the identification of offensive language (2018). [11] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, Predicting the type and target of offensive posts in social media, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 1415–1420. [12] S. Modha, T. Mandl, G. K. Shahi, H. Madhu, S. Satapara, T. Ranasinghe, M. Zampieri, Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech, in: FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event, 13th-17th Decem- ber 2021, ACM, 2021. [13] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (1997) 1735–1780. [14] L. Yao, C. Mao, Y. Luo, Graph convolutional networks for text classification, in: Proceed- ings of the AAAI conference on artificial intelligence, volume 33, 2019, pp. 7370–7377. [15] C. Sun, X. Qiu, Y. Xu, X. Huang, How to fine-tune bert for text classification?, in: China National Conference on Chinese Computational Linguistics, Springer, 2019, pp. 194–206. [16] A. F. Adoma, N.-M. Henry, W. Chen, Comparative analyses of bert, roberta, distilbert, and xlnet for text-based emotion recognition, in: 2020 17th International Computer Con- ference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), IEEE, 2020, pp. 117–121. [17] H. Peng, J. Li, Y. He, Y. Liu, M. Bao, L. Wang, Y. Song, Q. Yang, Large-scale hierarchical text classification with recursively regularized deep graph-cnn, in: Proceedings of the 2018 world wide web conference, 2018, pp. 1063–1072. [18] T. Mandl, S. Modha, G. K. Shahi, H. Madhu, S. Satapara, P. Majumder, J. Schäfer, T. Ranas- inghe, M. Zampieri, D. Nandini, A. K. Jaiswal, Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Lan- guages, in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation, CEUR, 2021. URL: http://ceur-ws.org/. [19] S. Gaikwad, T. Ranasinghe, M. Zampieri, C. M. Homan, Cross-lingual offensive language identification for low resource languages: The case of marathi, in: Proceedings of RANLP, 2021. [20] S. Satapara, S. Modha, T. Mandl, H. Madhu, P. Majumder, Overview of the HASOC Sub- track at FIRE 2021: Conversational Hate Speech Detection in Code-mixed language, in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation, CEUR, 2021. [21] C. Baziotis, N. Pelekis, C. Doulkeridis, Datastories at semeval-2017 task 4: Deep lstm with attention for message-level and topic-based sentiment analysis, in: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 747–754. [22] D. Wang, P. Liu, Y. Zheng, X. Qiu, X.-J. Huang, Heterogeneous graph neural networks for extractive document summarization, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 6209–6219. [23] W. Liao, B. Zeng, J. Liu, P. Wei, X. Cheng, W. Zhang, Multi-level graph neural network for text sentiment analysis, Computers & Electrical Engineering 92 (2021) 107096. [24] Y. Shen, H. Li, S. Yi, D. Chen, X. Wang, Person re-identification with deep similarity- guided graph neural network, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 486–504. [25] W. Shi, R. Rajkumar, Point-gnn: Graph neural network for 3d object detection in a point cloud, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, 2020, pp. 1711–1719. [26] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014). [27] J. D. M.-W. C. Kenton, L. K. Toutanova, Bert: Pre-training of deep bidirectional transform- ers for language understanding, in: Proceedings of NAACL-HLT, 2019, pp. 4171–4186. [28] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019). [29] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word representation, in: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.