Addressing Hate Speech: ATLANTIS for Efficient Hate Span Detection Niyar R Barman1,2 , Krish Sharma1,2 , Yashraj Poddar1 , Advaitha Vetagiri1,3 and Partha Pakray1 1 National Institute of Technology, Silchar, Assam, India - 788010 2 Both authors contributed equally to this research 3 Corresponding author. Abstract Hate speech poses significant challenges to maintaining healthy online conversations, and automated systems are crucial for its accurate detection and mitigation. In this paper, we (CNLP-NITS-PP) introduce ATLANTIS (Attentive Transformer-LSTM for Named Entity and Token Identification System), a robust model designed to address the pervasive issue of hate speech in online social media platforms. ATLANTIS focuses on hate span identification within sentences labeled as hate speech, framed as a sequence labeling task using BIO notation. Leveraging a Hate dataset enriched with Named Entity Recognition (NER) tags, ATLANTIS effectively identifies hate speech spans within the text by combining contextualized representations and sequential modeling. The empirical results showcase ATLANTIS’s effectiveness in isolating explicit signs of hate from a contextual backdrop, offering a promising solution for creating safer online environments. We achieve a macro F1 score of 0.488 on the public test set and 0.508 on the private test set. This work not only lays the foundation for future advancements in hate-span detection but also emphasizes the importance of model efficiency, interpretability, and expanded training data that encompass diverse linguistic nuances and evolving hate speech trends. Code is available at https://github.com/niyarrbarman/hasoc23 Keywords Hate Speech Detection, Named Entity Recognition (NER), Sequence Labeling, Natural Language Process- ing, Transformer, BiLSTM 1. Introduction Social media platforms like Twitter and Facebook have become commonplace in modern life, giving people worldwide easy access to voice their thoughts and connect. However, the open nature of these platforms also allows harmful content like hate speech, harassment, and threats aimed at vulnerable groups to spread [1]. This has created an urgent need for automated systems that accurately recognise abusive language to maintain healthy online conversations [2]. Forum for Information Retrieval Evaluation, December 15–18, 2023, Goa, India Envelope-Open barmanniyar@gmail.com (N. R. Barman); iamkrish9090@gmail.com (K. Sharma); yash.raj.poddar.yp@gmail.com (Y. Poddar); advaitha21_rs@cse.nits.ac.in (A. Vetagiri); partha@cse.nits.ac.in (P. Pakray) GLOBE https://niyarrbarman.github.io/ (N. R. Barman) Orcid 0009-0001-2112-2491 (N. R. Barman); 0009-0007-7001-7480 (K. Sharma); 0009-0007-6119-3255 (Y. Poddar); 0000-0002-0651-4171 (A. Vetagiri); 0000-0003-3834-5154 (P. Pakray) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings A significant hurdle is that offensive content can take many linguistic forms, necessitating context-aware models to pinpoint the specific snippets of text that render a post hateful or abusive [3]. Furthermore, implicit forms of hate speech, like veiled insults, require deducing pragmatic implications [4] rather than just spotting explicit derogatory terms [5]. This has driven recent research into models for singling out spans of text that communicate hateful intent within a given post [6]. This paper tackles the problem of hate span identification within sentences labelled as hate speech in the HASOC 2023 [7] shared task [8]. In this paper, we delve into the challenges and innovations of the HASOC subtrack at FIRE 2023, focusing on the ’Detection of Hate Spans and Conversational Hate-Speech,’ as outlined by Satapara et. al [9]. Given an English social media sentence already deemed hateful, the goal is to pinpoint contiguous spans of tokens that relay its hateful purpose. This is framed as a sequence labelling task using BIO notation, where each token is tagged as the Beginning (B), Inside (I), or Outside (O) of a hate span [10]. The HASOC dataset provides ground truth BIO tag sequences for abusive sentences from public hate speech sources [8]. Participants construct models to predict these spans in test sentences without extra preprocessing to avoid incongruities. This focused evaluation enables the systematic development of context-aware models and techniques for fine-grained hate speech analysis, moving beyond the binary classification of posts [11]. We present our proposed model design and tactic for the hate span identification task, harnessing contextualised representations and sequential modelling [12]. Results showcase our techniques’ efficacy in isolating explicit signs of hate from a contextual backdrop. By classifying specific linguistic cues and semantic relationships that encode hate, our method provides insights into the underlying fabric of abusive language [6]. 2. Application and Target Audience The research presented in this paper holds significant promise in tackling the pervasive problem of hate speech on online social media platforms. ATLANTIS, the hate span detection system that has been developed, carries practical implications for content moderation, user safety, and the improvement of online discussions. By precisely identifying and extracting hate spans from hateful sentences, ATLANTIS equips social media platforms to more efficiently filter and eliminate hateful content, thereby promoting a safer and more inclusive online environment. Furthermore, this technology can serve as a valuable tool for gaining insights into the prevalence and dynamics of hate speech, assisting researchers and policymakers in formulating evidence- based strategies to combat online hatred. This research paper is intended for a diverse audience encompassing various stakeholders concerned with the detection and mitigation of hate speech. Content moderators and social media platform administrators will find valuable insights and methodologies within as they work towards maintaining respectful and secure online communities. Researchers in the fields of natural language processing (NLP) and machine learning will appreciate the detailed methodology and architecture of the ATLANTIS model, which represents an advancement in state-of-the-art hate span detection. Policymakers and organizations focused on addressing online hate speech will also gain valuable insights into the potential of machine learning-based solutions for addressing this pressing issue. Furthermore, educators and students studying NLP, machine learning, and technology ethics can utilize this paper as a resource for understanding the development and application of advanced models for hate speech detection. Ultimately, this research paper aims to engage a broad and diverse audience, fostering collaboration and innovation in the ongoing effort to create safer online spaces. 3. Objective The primary objective of this research is to create a hate span detection system capable of pinpointing and extracting uninterrupted sequences of tokens found within hateful sentences, which we refer to as “hate spans”. These hate spans are characterized as consecutive sets of tokens within a sentence that collectively expresses explicit hatefulness. The aim of this shared task is to automatically identify and extract all such hateful spans from preprocessed sentences. The hate span detection task is approached as a sequence labeling problem, wherein each token in a sentence is labeled with a specific tag to indicate its association with a hateful span. The labeling follows the BIO notation, with ‘B’ signifying the beginning of a hate span, ‘I’ denoting the continuation of a hate span, and ‘O’ indicating all other tokens that are not part of any hate span within the sentence. The goal is to develop a machine-learning model to accurately predict the correct sequence of BIO tags for each token in a given sentence, effectively detecting and delineating hate spans within the text. 4. Proposed Methodology The methodology employed to address the issue of hate speech at scale through the ATLANTIS model comprises a systematic approach encompassing data preprocessing, tokenization, model architecture, and the classification process. Leveraging the HateNorm23 dataset, which features text samples paired with Named Entity Recognition (NER) tags categorizing each word as ‘B’ (signifying the start of a hate span), ‘I’ (indicating inclusion within a hate span), or ‘O’ (denoting other), we conduct word-level tokenization to segment the text into meaningful units. A custom tokenizer is then fine-tuned on the dataset to tailor tokenization for hate span detection. The ATLANTIS model adopts a multi-stage architecture, initially processing tokenized text through a custom transformer section followed by a bidirectional long short-term memory (Bi-LSTM) [13] section. The transformer captures contextual information and relationships, while the Bi-LSTM captures sequential dependencies. Subsequently, fused representations from these sections traverse fully connected layers for the conclusive classification task. Detailed insights into the architecture, hyperparameters, and experimental findings will be presented to substantiate ATLANTIS’s efficacy in mitigating hate speech at scale. ATLANTIS consists of three primary components: Transformer Encoder Block: The Transformer [14] block is a foundational component for capturing contextual relationships within sequences. Its self-attention mechanism enables the model to weigh the significance of each word in relation to others, allowing it to understand Figure 1: Architecture of ATLANTIS, comprising three primary components — Transformer Encoder Block, BiLSTM Layer, and Sequential Block with FC Layers—designed for effective sequence under- standing and Hate Span Identification complex dependencies and semantic connections. This block excels at learning hierarchical features from the input data, providing a solid basis for understanding the underlying patterns in the sequential data, which is particularly crucial in NLP tasks. BiLSTM Layer: The BiLSTM layer complements the Transformer’s strengths by effectively capturing sequential dependencies in the data. By incorporating a BiLSTM layer, the model can capture fine-grained temporal relationships and contextual nuances that might be missed by the Transformer alone. This is especially valuable for NER, where identifying entities often relies on sequential patterns. Sequential Block with FC Layers: The Sequential Block, containing Fully Connected layers, serves as a vital element for transforming the enriched features from the preceding blocks into a suitable format for making predictions. These FC layers allow for nonlinear transformations and higher-level abstractions, enabling the model to learn complex mappings from the learned representations to the target NER labels. Engineering Decisions: We aimed to identify a solution that excels in performance and efficiency. Our approach led us to employ a sequence of six transformer blocks. Upon extending the number of blocks, we observed a period during which the F1 score plateaued, roughly around 9 to 10 blocks. Subsequently, the score rapidly declined, indicative of overfitting taking hold. Regarding the BiLSTM layers, we integrated a single BiLSTM layer for the ultimate modeling phase. Elevating the count of BiLSTM layers increased the model’s complexity, rendering it more challenging to train and subsequently slowing down inference processes. We settled on a configuration of num_heads = 4 for the transformer block. Introducing additional num_heads led to a stage of diminishing returns. Given the limited size of our dataset, the model tended to memorize the training data rather than exhibiting the capacity to generalize to novel data. This phenomenon, in turn, resulted in overfitting or diminished performance. Adam was used as the optimizer with learning_rate = 1e-3 5. Dataset Figure 2: Visualization of Token-level BIO Tags Distribution in the dataset The dataset [15] comprises a total of 2421 data points. We partitioned this dataset into an 80:10:10 ratio, allocating segments for training, validation, and testing purposes. Within the dataset, a sum of 8165 distinct words can be found. The visualization of the dataset is presented in Figure 2. Notably, hate speech constitutes 17.422% of the entire dataset. 6. Results and Analyses In this section, we present the results of our experiments, organized into three subsections: Baseline Methods, Intrinsic Results, and Extrinsic Results. We discuss the models we used in the Baseline Methods section and provide details on the intrinsic and extrinsic performance of our approach. 6.1. Baseline Methods To establish a benchmark for our experiments and assess the effectiveness of our proposed method, we employed the following baseline models: Pretrained BERT: BERT [12] has shown remarkable success in various natural language processing tasks, and we included it as a reference to evaluate the performance of our approach against a state-of-the-art model. Transformer Encoder: The incorporation of the Transformer [14] Encoder, in our study serves a dual purpose. Firstly, it provides a reference point for evaluating the performance of our approach. Secondly, it underscores the effectiveness of the encoder layers, equipped with self-attention mechanisms, which play a key role in the remarkable success of BERT and similar models across various natural language processing tasks. BiLSTM: BiLSTM [13] networks have been widely used for sequence labeling tasks, and we included this baseline to evaluate our approach against a more traditional sequence labeling model. Table 1 Performance metrics of baseline models for different tags Model Precision Recall F1-Score BERT 0.58 0.56 0.57 Transformer 0.82 0.79 0.80 Bi-LSTM 0.56 0.61 0.58 6.2. Intrinsic Results In this subsection, we present the intrinsic results of our approach to the validation set. We discuss the performance of our model and provide a detailed analysis of the results. Our model’s performance on the validation set was evaluated using various metrics, including precision, recall and F1-score. They have been presented in Table 2. Table 2 ATLANTIS performance metrics for different tags BIO-Tags Precision Recall F1-Score B 0.83 0.78 0.77 I 0.72 0.81 0.76 O 0.97 0.95 0.96 Figure 3: Model’s F1 score variation over epochs Figure 4: Model’s Loss variation over epochs The graph in Figure 4 illustrates the model’s loss convergence during training. As we can observe, the loss steadily decreases over epochs, indicating that our model effectively learns to minimize the prediction errors. Figure 3 showcases the improvement in the F1-score over training epochs. The upward trend in F1-score suggests that our model becomes increasingly proficient at correctly identifying and labeling entities in the validation data as training progresses. 6.3. Extrinsic Results Table 3 Performance metrics of baseline models for different tags Macro F1-Score Model Public Private Test Set Test Set BERT 0.303 0.360 Transformer 0.446 0.473 Bi-LSTM 0.315 0.324 ATLANTIS 0.488 0.508 In this subsection, we present the extrinsic results of our approach to the competition test set. We report public and private test scores, commonly used in Kaggle competitions to evaluate model performance on unseen data. Table 3 summarizes our model’s public and private test scores and compares them with the baseline models. 7. Related Work In this section, we review several relevant studies that contribute to the understanding and development of hate speech detection, offensive language detection, and related natural lan- guage processing tasks. These works collectively provide insights into various approaches and techniques employed in this field. In Qian et al.’s (2019)[16] study [14], a new challenge called generative hate speech in- tervention was introduced. The authors augmented their research with two comprehensive datasets obtained from Reddit and Gab, which contained intervention responses collected from crowdsourcing. The assessment of three generative models, specifically Seq2Seq, VAE, and RL, revealed areas where hate speech intervention methods could be enhanced. In the work conducted by Alshalan et al. [17], they tackled the problem of hate speech in the Arabic Twittersphere. They introduced a dataset consisting of 9316 tweets categorized into hate speech, abuse, and normalcy. Their assessment encompassed various models, including CNN, GRU, CNN + GRU, and BERT. Among these models, CNN emerged as the most effective, achieving superior performance with an F1-score of 0.79 and an AUROC of 0.89. In the research conducted by Elalami et al. [18], they introduced a transfer learning strategy for detecting offensive language in multiple languages. This approach leveraged several BERT models, such as BERT, mBERT, and AraBERT. Their results were outstanding, surpassing the performance of current leading methods that employ joint-multilingual and translation- based approaches. This study underscored the robustness of BERT models in the context of Multilingual Offensive Language Detection. Ozler et al. [19] explored the application of BERT for multi-label and multi-domain incivility detection tasks. They successfully established a new state-of-the-art performance across various datasets. The study suggested that direct data combination from multiple domains yielded superior results compared to more intricate training methods. The study by Hoang et al. [20] introduced ViHOS, a novel Vietnamese dataset for hate and offensive span detection, containing 26,467 annotated spans in 11,056 comments. Baseline models, including XLM-RBase, XLM-RLarge, PhoBERTBase, and PhoBERTLarge, were evaluated, with the XLM-RLarge model leading with an F1-score of 0.7770. The study found that detecting multiple spans outperformed single-span detection in Vietnamese hate speech. Lample et al. [10] introduced a discriminative parsing-based approach for nested named entity recognition, demonstrating strong performance on top-level and nested entities. However, the study acknowledged a limitation in terms of speed compared to conventional flat techniques. The paper advocated for reconsidering the exclusion of embedded entities in NER corpora, highlighting the substantial information loss incurred by this design choice. Ma (2016) [21] presented a neural network architecture for sequence labeling, representing an end-to-end model without needing task-specific resources, feature engineering, or data preprocessing. The study attained state-of-the-art performance on two linguistic sequence labeling tasks, outperforming prior state-of-the-art systems. Peters et al. (2017) [22] proposed a simple semi-supervised approach using pre-trained neural language models to enhance token representations in sequence tagging models. Their approach consistently outperformed state-of-the-art models in NER and Chunking datasets. Notably, the study showed that including both forward and backward language models consistently improved performance. These related works collectively contribute valuable insights and methodologies that inform the development of hate speech detection and associated natural language processing tasks, showcasing the advancements and challenges in this field. 8. Conclusion and Future Scope In this research, we have presented ATLANTIS (Attentive Transformer-LSTM for Named Entity and Token Identification System), a robust model designed to combat hate speech at scale. Leveraging a Hate dataset with detailed Named Entity Recognition (NER) tags, ATLANTIS effectively identifies hate speech spans within textual content. Our multi-stage architecture, comprising a custom transformer and bidirectional LSTM, captures contextual information and sequential dependencies, facilitating precise hate span classification. Empirical results demonstrate ATLANTIS’s effectiveness in this critical task. As we continue to address the pressing issue of hate speech in digital spaces, ATLANTIS offers a promising solution for safer online environments. The work presented here lays the foundation for future advancements in hate span detection. Further improvements in model efficiency and interpretability, along with expanded training data encompassing diverse linguistic nuances and evolving hate speech trends, hold promise. Investigating the integration of real-time monitoring and incorporating user-specific context may enhance the model’s capabilities in dynamically changing online environments. Addi- tionally, exploring multilingual and cross-platform hate speech detection is vital for broader impact. As technology evolves, ATLANTIS and its successors are poised to play a pivotal role in fostering safer, more inclusive digital spaces. Acknowledgments We wish to extend our appreciation to the Computer Science and Engineering Department of the National Institute of Technology Silchar for granting us the opportunity to carry out our research and experiments. We are grateful for the support, resources, and research environment offered by the CNLP & AI Lab at NIT Silchar. References [1] B. Vidgen, L. Derczynski, (2020), Directions in abusive language training data, a systematic review: Garbage in, garbage out. PloS one 15 (2020). [2] P. Fortuna, S. Nunes, A survey on automatic detection of hate speech in text, ACM Computing Surveys (CSUR) 51 (2018) 1–30. [3] A. Vetagiri, P. K. Adhikary, P. Pakray, A. Das, “CNLP-NITS at SemEval-2023 Task 10: Online sexism prediction, PREDHATE!“, In the 17th International Workshop on Semantic Evaluation SemEval 2023 Toronto, Canada July 9-14, 2023. [4] D. Jurgens, L. Hemphill, E. Chandrasekharan, A just and comprehensive strategy for using NLP to address online abuse, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019. [5] A. Vetagiri, P. K. Adhikary, P. Pakray, A. Das, “Leveraging GPT-2 for Automated Classifica- tion of Online Sexist Content“, In Exist 2023 Lab at CLEF 2023: Conference and Labs of the Evaluation Forum, September 18–21, 2023, Thessaloniki, Greece, 2023. [6] B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, A. Mukherjee, (2021), Hatexplain: A benchmark dataset for explainable hate speech detection. Proceedings of the AAAI Conference on Artificial Intelligence 35 (2021) 14867–14875. [7] S. Masud, M. A. Khan, M. S. Akhtar, T. Chakraborty, Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span Detection, in: Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023. [8] T. Mandl, S. Modha, P. Majumder, D. Patel, M. Dave, C. Mandlia, A. Patel, Overview of the HASOC track at FIRE 2019: Hate speech and offensive content identification in Indo- European languages, Proceedings of the 11th Forum for Information Retrieval Evaluation, 2019. [9] S. Satapara, S. Masud, H. Madhu, M. A. Khan, M. S. Akhtar, T. Chakraborty, S. Modha, T. Mandl, Overview of the HASOC subtracks at FIRE 2023: Detection of hate spans and conversational hate-speech, in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2023, Goa, India. December 15-18, 2023, ACM, 2023. [10] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neural architectures for named entity recognition, arXiv preprint arXiv:1603 (2016) 01360. [11] Z. Zhang, L. Luo, Hate speech detection: A solved problem? The challenging case of long tail on Twitter, Semantic Web 10 (2019) 925–945. [12] J. Devlin, M. W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810 (2018) 04805. [13] M. Schuster, K. Paliwal, Bidirectional recurrent neural networks, Signal Processing, IEEE Transactions on 45 (1997) 2673 – 2681. doi:10.1109/78.650093 . [14] A. Vaswani, Attention is all you need, 2017. URL: https://arxiv.org/abs/1706.03762. [15] S. Masud, M. Bedi, M. A. Khan, M. S. Akhtar, T. Chakraborty, Proactively reducing the hate intensity of online posts via hate speech normalization, in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, Association for Computing Machinery, New York, NY, USA, 2022, p. 3524–3534. URL: https://doi.org/10.1145/3534678.3539161. doi:10.1145/3534678.3539161 . [16] J. Qian, A. Bethke, Y. Liu, E. Belding, W. Y. Wang, A benchmark dataset for learning to intervene in online hate speech, 2019. URL: https://arxiv.org/abs/1909.04251v1. [17] R. Alshalan, H. Al-Khalifa, A deep learning approach for automatic hate speech detection in the saudi twittersphere, Applied Sciences 10 (2020). URL: https://www.mdpi.com/ 2076-3417/10/23/8614. doi:10.3390/app10238614 . [18] F. zahra El-Alami, S. Ouatik El Alaoui, N. En Nahnahi, A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model, Journal of King Saud University - Computer and Information Sciences 34 (2022) 6048–6056. URL: https://www.sciencedirect.com/science/article/pii/S1319157821001804. doi:https://doi. org/10.1016/j.jksuci.2021.07.013 . [19] K. B. Ozler, K. Kenski, S. Rains, Y. Shmargad, K. Coe, S. Bethard, Fine-tuning for multi- domain and multi-label uncivil language detection, in: Proceedings of the Fourth Workshop on Online Abuse and Harms, Association for Computational Linguistics, Online, 2020, pp. 28–33. URL: https://aclanthology.org/2020.alw-1.4. doi:10.18653/v1/2020.alw- 1.4 . [20] P. G. Hoang, C. D. Luu, K. Q. Tran, K. V. Nguyen, N. L.-T. Nguyen, ViHOS: Hate speech spans detection for Vietnamese, in: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Dubrovnik, Croatia, 2023, pp. 652–669. URL: https://aclanthology.org/2023. eacl-main.47. [21] X. Ma, End-to-end sequence labeling via bi-directional lstm-cnns-crf, 2016. URL: https: //arxiv.org/abs/1603.01354. [22] M. E. Peters, W. Ammar, C. Bhagavatula, R. Power, Semi-supervised sequence tagging with bidirectional language models, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 1756–1765. URL: https:// aclanthology.org/P17-1161. doi:10.18653/v1/P17- 1161 .