<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LCTs at HODI: Homotransphobic Speech Detection on Italian Tweets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Locatelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo Locatelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Processing and Speech Tools for Italian</institution>
          ,
          <addr-line>Sep 7 - 8, Parma, IT</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technical University of Catalonia</institution>
          ,
          <addr-line>31 Calle Jordi Girona, 08034 Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Groningen</institution>
          ,
          <addr-line>Broerstraat 5, 9712 CP Groningen</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent research highlighted the importance of employing language and culture-specific techniques to accurately detect homotransphobic speech. In this paper, we present our involvement in Subtask A of EVALITA 2023's HODI shared task [1], which specifically addresses the identification of homotransphobic content in Italian tweets. Our approach employs a classifier built upon pre-trained Italian word embeddings. Our approach achieves the best results in the shared task, and can serve as a valuable tool to combat this harmful phenomenon. We release our code at https://github.com/davidelct/hodi2023.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>hate speech detection, homotransphobia, social media</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Social media platforms have revolutionized
communication, providing a space for diverse viewpoints and
opinions to be shared. While these platforms ofer invaluable
means of connection and expression, they have
unfortunately also become breeding grounds for online
harassment, particularly targeting minorities. This pervasive
issue has raised significant concerns about the safety and
well-being of the LGBTQIA+ community, which often</p>
      <p>One of the challenges associated with combating
online harassment is the ease with which users can freely
exCompounding the problem, social media algorithms
ofpress prejudiced views without immediate consequences. classifier.
ten contribute to the formation of echo chambers, where
individuals are predominantly exposed to content that
reinforces their existing beliefs [4]. Consequently, these
algorithms can inadvertently perpetuate discriminatory
attitudes and create an environment where hate speech
thrives.</p>
      <p>To address this pressing problem, the field of
natural language processing (NLP) ofers valuable resources
that can efectively identify harmful online content and
reduce its prevalence through automated hate speech
detection systems. By leveraging NLP techniques, online
EVALITA 2023: 8th Evaluation Campaign of Natural Language
0009-0006-4194-4907 (D. Locatelli)</p>
      <p>
        We use Subtask A of the HODI shared task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] from
the EVALITA 2023 workshop [7] to demonstrate that a
classifier based on monolingual Italian word embeddings
yields high results, highlighting how this approach can
capture the nuances of the cultural factors at play. In
Subtask A the goal is to predict whether a given tweet
contains homotransphobic speech or not. We found that
our approach achieves the highest results in the shared
task.
      </p>
      <p>The remainder of this paper is organized as follows.
Section 2 describes the data used in this work, and the
preprocessing techniques we employed. Our
methodolpresented in Section 3. Section 4 showcases the results
we obtained, while Section 5 contains a qualitative
analysis of the errors made by the diferent models in our study.
Section 6 concludes the paper discussing the implications
of our research and proposing future directions to tackle
homotransphobic hate speech on social media.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Data</title>
      <p>Here, we present an overview of the data utilized in our
study. This includes both the data released as part of the
HODI challenge, as well as the data on which the models
we utilized were pre-trained on. We did not undertake the
pre-training step ourselves; nevertheless, we believe that
describing the data is essential to ofer a comprehensive
understanding of the information to which the model has
been exposed.</p>
      <sec id="sec-3-1">
        <title>2.1. HODI Dataset</title>
        <p>The HODI task organizers provided 6,000 Italian tweets
manually labeled by expert annotators. For Subtask A,
the annotators categorized tweets into two classes:
homotransphobic or non-homotransphobic. The dataset was
split into 5,000 tweets for training and 1,000 tweets for
testing. To monitor the progress of our experiments, we
reserved 200 tweets from the training set for validation.</p>
        <p>The dataset statistics, as presented in Table 2.1, reflect a
well-balanced distribution between the two classes across
both the training and testing splits. This equilibrium
enhances the reliability of our results and ensures that our
model receives suficient exposure to diverse instances of
homotransphobic and non-homotransphobic language
in Italian tweets during the fine-tuning process.
Our pre-processing consists of removing usernames,
hashtags, and unnecessary white spaces from the tweets. To
tokenize the text, we utilize the tokenizer associated with
the pre-trained model that we describe in the next
section.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Models</title>
        <p>The three models used in our submission all consist of
classifiers built on top of UmBERTo [ 9]. The three models
all share the same hyperparameters (see Table 3.2), but
they difer in the number of fine-tuning epochs on the
HODI Subtask A data. Specifically:</p>
        <sec id="sec-3-2-1">
          <title>Model run1</title>
          <p>was fine-tuned for 3 epochs.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Model run2</title>
          <p>was fine-tuned for 5 epochs.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Model run3</title>
          <p>was fine-tuned for 10 epochs.</p>
          <p>UmBERTo is a Roberta-base language model [10]
pretrained on Italian text using SentencePiece and Whole
Word Masking techniques. For our classification tasks,
we specifically utilized the UmBERTo-Commoncrawl-Cased
2.2. OSCAR Dataset version. 1 Using the HuggingFace Transformers library
[11], we applied a classification head on top of the model
During the pre-training phase, the data utilized was the outputs, which enabled us to fine-tune the base model
Italian corpus from the OSCAR dataset [8]. This par- on the HODI data for Subtask A.
ticular collection of data is extensive, consisting of ap- The selection of the UmBERTo-Commoncrawl-Cased
proximately 70GB of plain text. Specifically, it contains version ofers enhanced compatibility with a wide array
210 million sentences and 11 billion words. The inclu- of text sources in comparison to alternative versions such
sion of such a vast amount of linguistic data ensures the as Umberto-wikipedia-uncased-v1. The latter model
model’s exposure to a wide range of sentence structures, is pre-trained on a smaller dataset consisting mainly of
vocabulary, and syntactic patterns present in the Italian Wikipedia posts, resulting in a narrower variety of text
language. types compared to OSCAR. Furthermore, the version we
selected retains the original casing of the text, which can
3. Methodology provide significant insights especially in social media
posts, where casing often serves as a means to convey
In this section we illustrate our approach, explaining both
the data pre-processing steps we undertook, as well as
the details of the models we utilized for Subtask A.</p>
          <p>1Available at https://huggingface.co/Musixmatch/
umberto-commoncrawl-cased-v1.</p>
          <p>Team name
LCTs
LCTs
odang4hodi
DH-FBK
extremITA
odang4hodi
DH-FBK
odang4hodi
LCTs
extremITA
INGEOTEC
Team Tamil
baseline
SOVRAG
SOVRAG
SOVRAG
CHILab
CHILab
CHILab
run3
run2
run1
run1
run2
run2
run2
run3
run1
run1
run1
run1
run1
run3
run2
run1
run3
run1
run2</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Error analysis</title>
      <p>We divide the error analysis in two parts. First we
consider examples that have been incorrectly categorized as
not homotransphobic by all models, despite the gold label
indicating the presence of homotransphobic speech. In
other words, we consider false negatives across all
models. This is so that we can gain an understanding of where
our system would fail to protect LGBTQIA+ individuals
online, highlighting directions for further refinements.</p>
      <p>Then we analyze examples on which Model run1 and
run2 failed to identify homotransphobia, but on which
run3 succeeded. This is to gain an understanding of the
impact of extended fine-tuning.</p>
      <sec id="sec-4-1">
        <title>5.1. False negatives</title>
        <p>strong emotions, opinions, and emphasis, and can prove
as a valuable signal to detect hate speech.</p>
        <p>To optimize our model, we employ the AdamW
optimizer [12] and utilize a linear learning rate scheduler. In
Table 3.2, we provide information about our
experimental configuration, outlining the specific hyperparameters
we selected.
4. Results In total, 108 examples were false negatives across models,
i.e. were wrongly classified as not homotransphobic by
To assess the accuracy of our model’s predictions, we em- all three models. We report the top 10 words appearing
ploy the Macro F1 score as the evaluation metric. Table 4 in these examples in Table 5.1. It is interesting to note
reports the results of our three runs, as well as all other that the term “f*mminiello” and its plural form are the
submissions to the HODI Subtask A. most frequently occurring words.</p>
        <p>We can observe that our approach is highly competi- This observation is noteworthy as the word is
primartive in the shared task. Specifically, Model run3 and ily used in the Neapolitan dialect rather than being widely
Model run2 achieve the highest and second-highest employed throughout Italy. It suggests that all models
score in the competition, with over 0.80 Macro F1 perfor- struggle with dialectal words that are infrequently
enmance. However, it should be noted that all models in countered in its Italian pre-training corpus. Further
inthe top five achieve over 0.79 Macro F1, and are within vestigation revealed that the fine-tuning data for HODI
0.2 point diference. While Model run1 does not appear Subtask A only included two tweets containing such
in the top five runs, it still achieves over 0.77 Macro F1. word, explaining why none of the models recognized this</p>
        <p>Focusing only on our runs, it is evident that the per- particular case.
formance improves as we extend the fine-tuning process, The remaining words in the table consist of various
as demonstrated by the increment in the score with addi- slurs, such as “rotto in culo” (a combination of the third
tional epochs. This observation highlights the positive and fourth words), which translates to “assf*cked.” This
impact of longer fine-tuning periods on the model’s pre- expression stigmatizes anal sex and, as it is
predomidictive capabilities. By allowing the model to undergo nantly used in its masculine form to insult men, it implies
more epochs, we enable it to refine its predictions. a negative connotation towards gay male sex. However, it
is important to note that this expression is also commonly
Table 5 built our model on top of UmBERTo, an Italian version of
Top words from the tweets where model run3 improved com- BERT, pre-trained on a large amount of Italian data. We
pared to the other models. ifne-tuned it using the HODI Subtask A data. We
experimented by running the fine-tuning process for diferent
Word English translation Count number of epochs, and obtained high Macro F1 scores
Seduto Sat down 4 for all runs, around 0.8.</p>
        <p>F*mminielli Efeminate gay men 3 In future work, it would be worth comparing this
perGrandissimo Very big 2 formance with that of classifiers based on multilingual
FCiagsliao SHoonme 22 pre-trained word embeddings. Given the linguistic and
F*mminiello Efeminate gay man 2 culture-specific phenomena that characterize
homotransGIOELE First name (male) 2 phobic speech, it would be interesting to understand
MAGALDI Last name 2 whether targeted monolingual embeddings yield better
Problema Problem 2 results than multilingual ones, potentially uncovering
Verona Verona (city) 2 whether the former have a better time with nuanced edge
cases.</p>
        <p>While Italian is not a low-resource language, it would
used to insult non-gay individuals, making the identifica- be also interesting to run this experiment with
multilintion of harassment towards LGBTQIA+ individuals more gual embeddings obtained from a dataset that does not
complex and context-dependent (e.g., considering the include Italian, to understand whether the model can
genidentity of the person being targeted). Nevertheless, it eralize from languages that exhibit similar phenomena
is worth mentioning that even when used to target non- as the target.</p>
        <p>LGBTQIA+ individuals, many people may still consider
such expression to be homotransphobic, which is up for Acknowledgments
debate.</p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Improvements from extended fine-tuning</title>
        <p>In total, 29 examples were correctly classified by Model
run3, and incorrectly classified by the other two. We
report the top 10 words appearing in these examples in
Table 5.2.</p>
        <p>We can observe that model run3 corrects a few of
the false negatives containing the word “f*mminiello”
described above, suggesting that more epochs allow the
model to pick up on more subtle patterns present in the
rest of the tweets.</p>
        <p>Another interesting phenomenon is that of the words
“GIOELE MAGALDI”, which are a first and last name of
an Italian male author, often insulted on social media
with homotransphobic slurs. It is interesting to observe
that model 3 was able to pick up on the harassment of an
individual compared to the previous runs. This author
is often insulted in all-caps tweets, which might have
helped the model pick up on the aggressiveness of the
language.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>
        In this paper we described our approach to the HODI
Subtask A [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] at EVALITA 2023 [7] on
homotransphobic speech detection. The goal of our participation was
to assess the efectiveness of using a simple classifier
based on monolingual pre-trained word embeddings. We
We thank the task organizers for setting up this shared
challenge and providing the HODI dataset, a valuable
resource for future work on this important area of research.
      </p>
      <p>Davide Locatelli is part of the INTERACT group of the
Technical University of Catalonia, and is supported by the
European Research Council under the European Union’s
Horizon 2020 research and innovation program (grant
No. 853459). We gratefully acknowledge the computer
resources at Artemisa, funded by the European Union
ERDF and Comunitat Valenciana, and the technical
support provided by the Instituto de Fisica Corpuscular, IFIC
(CSIC-UV).
[4] M. Cinelli, G. D. F. Morales, A. Galeazzi, W. Quattro- putational Linguistics, Online, 2020, pp. 38–45.
ciocchi, M. Starnini, The echo chamber efect on so- URL: https://aclanthology.org/2020.emnlp-demos.6.
cial media, Proceedings of the National Academy of doi:10.18653/v1/2020.emnlp- demos.6.
Sciences 118 (2021) e2023301118. URL: https://www. [12] I. Loshchilov, F. Hutter, Decoupled weight
depnas.org/doi/abs/10.1073/pnas.2023301118. doi:10. cay regularization, in: International Conference
1073/pnas.2023301118. on Learning Representations, 2019. URL: https:
[5] B. R. Chakravarthi, R. Priyadharshini, R. Pon- //openreview.net/forum?id=Bkg6RiCqY7.
nusamy, P. K. Kumaresan, K. Sampath, D.
Thenmozhi, S. Thangasamy, R. Nallathambi, J. P.
McCrae, Dataset for identification of homophobia and
transophobia in multilingual youtube comments,</p>
      <p>ArXiv abs/2109.00227 (2021).
[6] D. Locatelli, G. Damo, D. Nozza, A cross-lingual
study of homotransphobia on Twitter, in:
Proceedings of the First Workshop on Cross-Cultural
Considerations in NLP (C3NLP), Association for
Computational Linguistics, Dubrovnik, Croatia, 2023, pp.
16–24. URL: https://aclanthology.org/2023.c3nlp-1.</p>
      <p>3.
[7] M. Lai, S. Menini, M. Polignano, V. Russo, R.
Sprugnoli, G. Venturi, Evalita 2023: Overview of the 8th
evaluation campaign of natural language
processing and speech tools for italian, in: Proceedings
of the Eighth Evaluation Campaign of Natural
Language Processing and Speech Tools for Italian. Final
Workshop (EVALITA 2023), CEUR.org, Parma, Italy,
2023.
[8] P. J. Ortiz Suárez, L. Romary, B. Sagot, A
monolingual approach to contextualized word embeddings
for mid-resource languages, in: Proceedings of the
58th Annual Meeting of the Association for
Computational Linguistics, Association for
Computational Linguistics, Online, 2020, pp. 1703–1714. URL:
https://aclanthology.org/2020.acl-main.156. doi:10.</p>
      <p>18653/v1/2020.acl- main.156.
[9] L. Parisi, S. Francia, P. Magnani, Umberto: An
italian language model trained with whole word
masking, https://github.com/musixmatchresearch/
umberto, 2020.
[10] L. Zhuang, L. Wayne, S. Ya, Z. Jun, A robustly
optimized BERT pre-training approach with
posttraining, in: Proceedings of the 20th Chinese
National Conference on Computational
Linguistics, Chinese Information Processing Society of
China, Huhhot, China, 2021, pp. 1218–1227. URL:
https://aclanthology.org/2021.ccl-1.108.
[11] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C.
Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M.
Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma,
Y. Jernite, J. Plu, C. Xu, T. Le Scao, S.
Gugger, M. Drame, Q. Lhoest, A. Rush,
Transformers: State-of-the-art natural language
processing, in: Proceedings of the 2020 Conference on
Empirical Methods in Natural Language
Processing: System Demonstrations, Association for
Com</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Cignarella</surname>
          </string-name>
          , G. Damo,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          , HODI at EVALITA 2023:
          <article-title>Overview of the Homotransphobia Detection in Italian Task, in: Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2023</year>
          ), CEUR.org, Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <article-title>The state of profanity obfuscation in natural language processing scientific publications</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics: ACL</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>GLAAD</surname>
          </string-name>
          , Social media safety index,
          <year>2022</year>
          . URL: https: //sites.google.com/glaad.org/smsi/platform-scores, accessed:
          <fpage>2023</fpage>
          -07-22.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>