Identify Hate Speech Spreaders on Twitter using Transformer Embeddings Features and AutoML Classifiers Notebook for PAN at CLEF 2021 Talha Anwar1 1 Independent Researcher Abstract Hate speech against other communities, religions and countries is getting more common on social media. There is a need to control the spread of hate and offensive language on social media. Most studies identify whether a sentence is a hatred or not. This paper deals with the identification of whether a user spreads hate on Twitter or not by analyzing hundreds of tweets from the user. Feature embeddings of hundreds of tweets of a user are extracted using different transformers techniques such as BERT, BERTTweet, and RoBERTa for the English language and BETO for the Spanish language. An AutoML classifier is used to classify these embedding features. An accuracy of 75% and 85% is achieved using five-fold cross- validation and Accuracy of 72% and 82% is obtained for gold standard test data for English and Spanish, respectively. This paper secured 4th position in PAN competition. Keywords hate speech, transformers, AutoML, Twitter 1. Introduction Hate speech and offensive language against ethnicity, race, color, nationality, gender, sexual orientation, religion, or other characteristics are common on social media, particularly on Twitter. If a particular group raises hate Twitter trend against one specific group, the supporter of that group reply with a more intensive and offensive hate trend. This is more common among political party supporters. Recent studies also show that not only among political parties, this behavior is also observed among countries such as China received a lot of hate comments based on COVID spread from USA [1]. A lot of studies are conducted to identify the hatred, offensive tweets, and fake news [2] in different languages [3, 4] on twitter as well as youtube [5]. These studies focused on the identification of hatred tweets instead of hate spreaders. Profiling hate speech spreaders on Twitter is current challenge to identify hate spreaders on user base instead of a single tweet [6, 7]. This competition is organized at TIRA Integrated Research Architecture to maintain the essence of reproduce-ability of the results [8]. CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania " chtalhaanwar@gmail.com (T. Anwar) ~ https://github.com/talhaanwarch (T. Anwar) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) This paper deals with profiling hate speech spreaders on Twitter using transformers embed- dings as features and AutoML as classifiers. Tweet preprocessing, classification based on tweet and user are both studied this paper. 2. Methodology 2.1. Dataset Dataset consists of English and Spanish tweets. For each language, data of 300 users is collected; 200 for training and 100 for testing. Each user data comprised of 200 tweets. The rating as hate speech or not is based on the user instead of the tweet. It meant that 200 tweets for each user are analyzed to label that user as a hate spreader. 2.2. Pre-processing of tweets Text data is pre-processed to remove any noise. First digits are removed, followed by ’RT’,’#USER#’, and ’#URL#’. Next, tweets are converted to lower case. Stop-words were not removed from the tweet. 2.3. Tweet level training Most NLP algorithms work best on the sentence level. So tweet level model training is used as our baseline approach. A label is assigned to each tweet from the user label. All the tweets of a user are given the label of that user. This way 40,000 tweets are obtained and split into train, validation and test at a ratio of 80:10:10. Train set consists of 32400 tweets, validation and test set comprised of 4000 and 3600 tweets, respectively. This test data is not the same as gold standard test data. This is created for internal evaluation of the system. The aim is that once the model is trained on tweet level, we will predict on tweet level for each user and then average the prediction of all tweets for a particular user and assigned it as user label. For the English baseline model, BERT base uncased model is implemented on tweet level to solve the problem. Ktrain, a lightweight wrapper of deep learning libraries, is used to implement this [9]. The maximum length of the transformer token is set to 20, batch-size to 16, number of epochs to 5. Fit-one-cycle is used to train the model with a learning rate of 1𝑒−5 . 2.4. User level training In user-level model training, instead of training the BERT model [10] on our data, we extracted the BERT embeddings from the pre-trained BERT model. The embeddings are extracted from the last hidden layer of the BERT model. Features are also obtained by concatenating the last four layers of BERT model as mentioned in BERT paper. We want to test from which layer the better results can be achieved. These feature embeddings are extracted on tweet level and then averaged at a user level. These user-level features are then fed to autogluon tabular predictor for classification. [11]. As there are two features set one from the last layer and the other from last 4 layers, so two separate AutoML models are trained. We also used BERTTweet [12] and RoBerta [13] model to extract features of English data. In BERTTweet model preprocessing step is changed ’#USER#’ is replaced with ’@USER’ and ’#URL#’ is replaced by ’HTTPURL’. Emojis are converted to text. For Spanish language, BETO (BERTSpanish) model is used [14]. HuggingFace framework is used to implement all of these transformers models. Tweet 1 Tweet 2 Pre-processing User Tweets Tweet n Transformers Classification Averaged (feature extraction) Figure 1: Flow diagram of user level hate spread classification For submission of results to competition, we used 5-fold cross validation such that gold test data (unlabeled) is evaluated in each fold and finally the prediction is weighted averaged for each model. No train, validation and test splitting as in tweet level training is applied as in user level training data is limited because of averaging. 3. Results The baseline approach is based on tweet-level training resulted in over-fitting. The training accuracy achieved for English task using BERT is 85%, whereas validation accuracy is 49% and test accuracy is 53%. As initial results are not good, so further 5-fold cross-validation is not applied. For the Spanish task, train, validation and test accuracy achieved is 92%, 64% and 56% respectively. In user-level classification, the average 5 fold accuracy obtained is 75%, 73.5%, and 70% using embeddings from the last hidden layer of BERT, BERTTweet and RoBERTa, respectively. From the last 4 hidden layers of BERT, BERTTweet and RoBERTa, the accuracy of 72.5%, 70% and 73% is achieved. For the Spanish language, BETO resulted in an accuracy of 85% and 85% from embeddings of the last hidden layer and last four hidden layers, respectively. On gold standard test data, the accuracy of 72% and 82% is obtained for English and Spanish, respectively. 4. Discussion Profiling hate speech spreaders on Twitter should be based on user profiles containing multiple tweets. It cannot be identified using a single tweet of a user. So deep learning methods based on sentence/tweet classification failed to classify the user as a hate spreader or not. There needs an approach to consider all tweets of a user at once while training. One way to do is to combine all tweets of user and do documentation classification. This technique produces large text documents which are not feasible to classify as transformers’ self-attention mechanism computational expense increase quadratically. One way is to use transformers which have a linear scale able self-attention mechanism such as Longformer [15], but the issue is that in document sentence have some connection with each other. In our case, the tweets have no connection with each other, so Longformer is not a feasible option. To resolve this issue, we extracted embeddings of each tweet and average to make one embedding sequence for one user. 5. Conclusion Identify hate speech spread on Twitter is a need of the day to make social media a peaceful platform. This paper proposed a feature extraction technique using transformers embeddings and AutoML classifiers to classify hate spread users on Twitter. A 5 fold validation accuracy of 75% and 85% is obtained for English and Spanish language, which reduced by 3% on gold standard test data. This paper also discussed the disadvantage of classification techniques based o tweets instead of users and also highlighted the impact of using long sequence transformers. 6. Acknowledgments I would like to express my special thanks to all organizers specially Francisco Manuel Rangel Pardo, Paolo Rosso and Elisabetta Fersini References [1] L. Fan, H. Yu, Z. Yin, Stigmatization in social media: Documenting and analyzing hate speech for covid-19 on twitter, Proceedings of the Association for Information Science and Technology 57 (2020) e313. [2] M. S. Javed, H. Majeed, H. Mujtaba, M. O. Beg, Fake reviews classification using deep learning ensemble of shallow convolutions, Journal of Computational Social Science (2021) 1–20. [3] V. Basile, C. Bosco, E. Fersini, N. Debora, V. Patti, F. M. R. Pardo, P. Rosso, M. Sanguinetti, et al., Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter, in: 13th International Workshop on Semantic Evaluation, Association for Computational Linguistics, 2019, pp. 54–63. [4] T. Anwar, O. Baig, Tac at semeval-2020 task 12: Ensembling approach for multilingual offensive language identification in social media, in: Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020, pp. 2177–2182. [5] T. Tehreem, Sentiment analysis for youtube comments in roman urdu, arXiv preprint arXiv:2102.10075 (2021). [6] J. Bevendorff, B. Chulvi, G. L. D. L. P. Sarracén, M. Kestemont, E. Manjavacas, I. Markov, M. Mayerl, M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wiegmann, M. Wol- ska, , E. Zangerle, Overview of PAN 2021: Authorship Verification,Profiling Hate Speech Spreaders on Twitter,and Style Change Detection, in: 12th International Conference of the CLEF Association (CLEF 2021), Springer, 2021. [7] F. Rangel, G. L. D. L. P. Sarracén, B. Chulvi, E. Fersini, P. Rosso, Profiling Hate Speech Spreaders on Twitter Task at PAN 2021, in: CLEF 2021 Labs and Workshops, Notebook Papers, CEUR-WS.org, 2021. [8] M. Potthast, T. Gollub, M. Wiegmann, B. Stein, TIRA Integrated Research Architecture, in: N. Ferro, C. Peters (Eds.), Information Retrieval Evaluation in a Changing World, The Information Retrieval Series, Springer, Berlin Heidelberg New York, 2019. doi:10.1007/ 978-3-030-22948-1\_5. [9] A. S. Maiya, ktrain: A low-code library for augmented machine learning, arXiv preprint arXiv:2004.10703 (2020). arXiv:2004.10703. [10] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [11] N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, A. Smola, Autogluon-tabular: Robust and accurate automl for structured data, arXiv preprint arXiv:2003.06505 (2020). [12] D. Q. Nguyen, T. Vu, A. T. Nguyen, Bertweet: A pre-trained language model for english tweets, arXiv preprint arXiv:2005.10200 (2020). [13] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019). [14] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert model and evaluation data, in: PML4DC at ICLR 2020, 2020. [15] I. Beltagy, M. E. Peters, A. Cohan, Longformer: The long-document transformer, arXiv preprint arXiv:2004.05150 (2020).