-

LCTs at HODI: Homotransphobic Speech Detection on Italian Tweets

Davide Locatelli

0 1 3

Lorenzo Locatelli

0 2 3 0 Processing and Speech Tools for Italian , Sep 7 - 8, Parma, IT 1 Technical University of Catalonia , 31 Calle Jordi Girona, 08034 Barcelona , Spain 2 University of Groningen , Broerstraat 5, 9712 CP Groningen , Netherlands 3 Workshop Proce dings

Recent research highlighted the importance of employing language and culture-specific techniques to accurately detect homotransphobic speech. In this paper, we present our involvement in Subtask A of EVALITA 2023's HODI shared task [1], which specifically addresses the identification of homotransphobic content in Italian tweets. Our approach employs a classifier built upon pre-trained Italian word embeddings. Our approach achieves the best results in the shared task, and can serve as a valuable tool to combat this harmful phenomenon. We release our code at https://github.com/davidelct/hodi2023.

hate speech detection, homotransphobia, social media

1. Introduction

Social media platforms have revolutionized communication, providing a space for diverse viewpoints and opinions to be shared. While these platforms ofer invaluable means of connection and expression, they have unfortunately also become breeding grounds for online harassment, particularly targeting minorities. This pervasive issue has raised significant concerns about the safety and well-being of the LGBTQIA+ community, which often

One of the challenges associated with combating online harassment is the ease with which users can freely exCompounding the problem, social media algorithms ofpress prejudiced views without immediate consequences. classifier. ten contribute to the formation of echo chambers, where individuals are predominantly exposed to content that reinforces their existing beliefs [4]. Consequently, these algorithms can inadvertently perpetuate discriminatory attitudes and create an environment where hate speech thrives.

To address this pressing problem, the field of natural language processing (NLP) ofers valuable resources that can efectively identify harmful online content and reduce its prevalence through automated hate speech detection systems. By leveraging NLP techniques, online EVALITA 2023: 8th Evaluation Campaign of Natural Language 0009-0006-4194-4907 (D. Locatelli)

We use Subtask A of the HODI shared task [ 1 ] from the EVALITA 2023 workshop [7] to demonstrate that a classifier based on monolingual Italian word embeddings yields high results, highlighting how this approach can capture the nuances of the cultural factors at play. In Subtask A the goal is to predict whether a given tweet contains homotransphobic speech or not. We found that our approach achieves the highest results in the shared task.

The remainder of this paper is organized as follows. Section 2 describes the data used in this work, and the preprocessing techniques we employed. Our methodolpresented in Section 3. Section 4 showcases the results we obtained, while Section 5 contains a qualitative analysis of the errors made by the diferent models in our study. Section 6 concludes the paper discussing the implications of our research and proposing future directions to tackle homotransphobic hate speech on social media.

2. Data

Here, we present an overview of the data utilized in our study. This includes both the data released as part of the HODI challenge, as well as the data on which the models we utilized were pre-trained on. We did not undertake the pre-training step ourselves; nevertheless, we believe that describing the data is essential to ofer a comprehensive understanding of the information to which the model has been exposed.

2.1. HODI Dataset

The HODI task organizers provided 6,000 Italian tweets manually labeled by expert annotators. For Subtask A, the annotators categorized tweets into two classes: homotransphobic or non-homotransphobic. The dataset was split into 5,000 tweets for training and 1,000 tweets for testing. To monitor the progress of our experiments, we reserved 200 tweets from the training set for validation.

The dataset statistics, as presented in Table 2.1, reflect a well-balanced distribution between the two classes across both the training and testing splits. This equilibrium enhances the reliability of our results and ensures that our model receives suficient exposure to diverse instances of homotransphobic and non-homotransphobic language in Italian tweets during the fine-tuning process. Our pre-processing consists of removing usernames, hashtags, and unnecessary white spaces from the tweets. To tokenize the text, we utilize the tokenizer associated with the pre-trained model that we describe in the next section.

3.2. Models

The three models used in our submission all consist of classifiers built on top of UmBERTo [ 9]. The three models all share the same hyperparameters (see Table 3.2), but they difer in the number of fine-tuning epochs on the HODI Subtask A data. Specifically:

Model run1

was fine-tuned for 3 epochs.

Model run2

was fine-tuned for 5 epochs.

Model run3

was fine-tuned for 10 epochs.

UmBERTo is a Roberta-base language model [10] pretrained on Italian text using SentencePiece and Whole Word Masking techniques. For our classification tasks, we specifically utilized the UmBERTo-Commoncrawl-Cased 2.2. OSCAR Dataset version. 1 Using the HuggingFace Transformers library [11], we applied a classification head on top of the model During the pre-training phase, the data utilized was the outputs, which enabled us to fine-tune the base model Italian corpus from the OSCAR dataset [8]. This par- on the HODI data for Subtask A. ticular collection of data is extensive, consisting of ap- The selection of the UmBERTo-Commoncrawl-Cased proximately 70GB of plain text. Specifically, it contains version ofers enhanced compatibility with a wide array 210 million sentences and 11 billion words. The inclu- of text sources in comparison to alternative versions such sion of such a vast amount of linguistic data ensures the as Umberto-wikipedia-uncased-v1. The latter model model’s exposure to a wide range of sentence structures, is pre-trained on a smaller dataset consisting mainly of vocabulary, and syntactic patterns present in the Italian Wikipedia posts, resulting in a narrower variety of text language. types compared to OSCAR. Furthermore, the version we selected retains the original casing of the text, which can 3. Methodology provide significant insights especially in social media posts, where casing often serves as a means to convey In this section we illustrate our approach, explaining both the data pre-processing steps we undertook, as well as the details of the models we utilized for Subtask A.

1Available at https://huggingface.co/Musixmatch/ umberto-commoncrawl-cased-v1.

Team name LCTs LCTs odang4hodi DH-FBK extremITA odang4hodi DH-FBK odang4hodi LCTs extremITA INGEOTEC Team Tamil baseline SOVRAG SOVRAG SOVRAG CHILab CHILab CHILab run3 run2 run1 run1 run2 run2 run2 run3 run1 run1 run1 run1 run1 run3 run2 run1 run3 run1 run2

5. Error analysis

We divide the error analysis in two parts. First we consider examples that have been incorrectly categorized as not homotransphobic by all models, despite the gold label indicating the presence of homotransphobic speech. In other words, we consider false negatives across all models. This is so that we can gain an understanding of where our system would fail to protect LGBTQIA+ individuals online, highlighting directions for further refinements.

Then we analyze examples on which Model run1 and run2 failed to identify homotransphobia, but on which run3 succeeded. This is to gain an understanding of the impact of extended fine-tuning.

5.1. False negatives

strong emotions, opinions, and emphasis, and can prove as a valuable signal to detect hate speech.

To optimize our model, we employ the AdamW optimizer [12] and utilize a linear learning rate scheduler. In Table 3.2, we provide information about our experimental configuration, outlining the specific hyperparameters we selected. 4. Results In total, 108 examples were false negatives across models, i.e. were wrongly classified as not homotransphobic by To assess the accuracy of our model’s predictions, we em- all three models. We report the top 10 words appearing ploy the Macro F1 score as the evaluation metric. Table 4 in these examples in Table 5.1. It is interesting to note reports the results of our three runs, as well as all other that the term “f*mminiello” and its plural form are the submissions to the HODI Subtask A. most frequently occurring words.

We can observe that our approach is highly competi- This observation is noteworthy as the word is primartive in the shared task. Specifically, Model run3 and ily used in the Neapolitan dialect rather than being widely Model run2 achieve the highest and second-highest employed throughout Italy. It suggests that all models score in the competition, with over 0.80 Macro F1 perfor- struggle with dialectal words that are infrequently enmance. However, it should be noted that all models in countered in its Italian pre-training corpus. Further inthe top five achieve over 0.79 Macro F1, and are within vestigation revealed that the fine-tuning data for HODI 0.2 point diference. While Model run1 does not appear Subtask A only included two tweets containing such in the top five runs, it still achieves over 0.77 Macro F1. word, explaining why none of the models recognized this

Focusing only on our runs, it is evident that the per- particular case. formance improves as we extend the fine-tuning process, The remaining words in the table consist of various as demonstrated by the increment in the score with addi- slurs, such as “rotto in culo” (a combination of the third tional epochs. This observation highlights the positive and fourth words), which translates to “assf*cked.” This impact of longer fine-tuning periods on the model’s pre- expression stigmatizes anal sex and, as it is predomidictive capabilities. By allowing the model to undergo nantly used in its masculine form to insult men, it implies more epochs, we enable it to refine its predictions. a negative connotation towards gay male sex. However, it is important to note that this expression is also commonly Table 5 built our model on top of UmBERTo, an Italian version of Top words from the tweets where model run3 improved com- BERT, pre-trained on a large amount of Italian data. We pared to the other models. ifne-tuned it using the HODI Subtask A data. We experimented by running the fine-tuning process for diferent Word English translation Count number of epochs, and obtained high Macro F1 scores Seduto Sat down 4 for all runs, around 0.8.

F*mminielli Efeminate gay men 3 In future work, it would be worth comparing this perGrandissimo Very big 2 formance with that of classifiers based on multilingual FCiagsliao SHoonme 22 pre-trained word embeddings. Given the linguistic and F*mminiello Efeminate gay man 2 culture-specific phenomena that characterize homotransGIOELE First name (male) 2 phobic speech, it would be interesting to understand MAGALDI Last name 2 whether targeted monolingual embeddings yield better Problema Problem 2 results than multilingual ones, potentially uncovering Verona Verona (city) 2 whether the former have a better time with nuanced edge cases.

While Italian is not a low-resource language, it would used to insult non-gay individuals, making the identifica- be also interesting to run this experiment with multilintion of harassment towards LGBTQIA+ individuals more gual embeddings obtained from a dataset that does not complex and context-dependent (e.g., considering the include Italian, to understand whether the model can genidentity of the person being targeted). Nevertheless, it eralize from languages that exhibit similar phenomena is worth mentioning that even when used to target non- as the target.

LGBTQIA+ individuals, many people may still consider such expression to be homotransphobic, which is up for Acknowledgments debate.

5.2. Improvements from extended fine-tuning

In total, 29 examples were correctly classified by Model run3, and incorrectly classified by the other two. We report the top 10 words appearing in these examples in Table 5.2.

We can observe that model run3 corrects a few of the false negatives containing the word “f*mminiello” described above, suggesting that more epochs allow the model to pick up on more subtle patterns present in the rest of the tweets.

Another interesting phenomenon is that of the words “GIOELE MAGALDI”, which are a first and last name of an Italian male author, often insulted on social media with homotransphobic slurs. It is interesting to observe that model 3 was able to pick up on the harassment of an individual compared to the previous runs. This author is often insulted in all-caps tweets, which might have helped the model pick up on the aggressiveness of the language.

6. Conclusion

In this paper we described our approach to the HODI Subtask A [ 1 ] at EVALITA 2023 [7] on homotransphobic speech detection. The goal of our participation was to assess the efectiveness of using a simple classifier based on monolingual pre-trained word embeddings. We We thank the task organizers for setting up this shared challenge and providing the HODI dataset, a valuable resource for future work on this important area of research.

Davide Locatelli is part of the INTERACT group of the Technical University of Catalonia, and is supported by the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant No. 853459). We gratefully acknowledge the computer resources at Artemisa, funded by the European Union ERDF and Comunitat Valenciana, and the technical support provided by the Instituto de Fisica Corpuscular, IFIC (CSIC-UV). [4] M. Cinelli, G. D. F. Morales, A. Galeazzi, W. Quattro- putational Linguistics, Online, 2020, pp. 38–45. ciocchi, M. Starnini, The echo chamber efect on so- URL: https://aclanthology.org/2020.emnlp-demos.6. cial media, Proceedings of the National Academy of doi:10.18653/v1/2020.emnlp- demos.6. Sciences 118 (2021) e2023301118. URL: https://www. [12] I. Loshchilov, F. Hutter, Decoupled weight depnas.org/doi/abs/10.1073/pnas.2023301118. doi:10. cay regularization, in: International Conference 1073/pnas.2023301118. on Learning Representations, 2019. URL: https: [5] B. R. Chakravarthi, R. Priyadharshini, R. Pon- //openreview.net/forum?id=Bkg6RiCqY7. nusamy, P. K. Kumaresan, K. Sampath, D. Thenmozhi, S. Thangasamy, R. Nallathambi, J. P. McCrae, Dataset for identification of homophobia and transophobia in multilingual youtube comments,

ArXiv abs/2109.00227 (2021). [6] D. Locatelli, G. Damo, D. Nozza, A cross-lingual study of homotransphobia on Twitter, in: Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), Association for Computational Linguistics, Dubrovnik, Croatia, 2023, pp. 16–24. URL: https://aclanthology.org/2023.c3nlp-1.

3. [7] M. Lai, S. Menini, M. Polignano, V. Russo, R. Sprugnoli, G. Venturi, Evalita 2023: Overview of the 8th evaluation campaign of natural language processing and speech tools for italian, in: Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), CEUR.org, Parma, Italy, 2023. [8] P. J. Ortiz Suárez, L. Romary, B. Sagot, A monolingual approach to contextualized word embeddings for mid-resource languages, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, 2020, pp. 1703–1714. URL: https://aclanthology.org/2020.acl-main.156. doi:10.

18653/v1/2020.acl- main.156. [9] L. Parisi, S. Francia, P. Magnani, Umberto: An italian language model trained with whole word masking, https://github.com/musixmatchresearch/ umberto, 2020. [10] L. Zhuang, L. Wayne, S. Ya, Z. Jun, A robustly optimized BERT pre-training approach with posttraining, in: Proceedings of the 20th Chinese National Conference on Computational Linguistics, Chinese Information Processing Society of China, Huhhot, China, 2021, pp. 1218–1227. URL: https://aclanthology.org/2021.ccl-1.108. [11] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, A. Rush, Transformers: State-of-the-art natural language processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Com

[1]

Nozza ,

A. T.

Cignarella , G. Damo,

Caselli ,

Patti , HODI at EVALITA 2023: Overview of the Homotransphobia Detection in Italian Task, in: Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian . Final Workshop (EVALITA 2023 ), CEUR.org, Parma, Italy, 2023 .

[2]

Nozza ,

Hovy , The state of profanity obfuscation in natural language processing scientific publications , in: Findings of the Association for Computational Linguistics: ACL 2023 , Association for Computational Linguistics , 2023 .

[3] GLAAD , Social media safety index, 2022 . URL: https: //sites.google.com/glaad.org/smsi/platform-scores, accessed: 2023 -07-22.