<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>minimalist approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Irene Siragusa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Pirrone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Ingegneria @ Università degli Studi di Palermo</institution>
          ,
          <addr-line>Viale delle Scienze, Edificio 6 90129 - Palermo</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Processing and Speech Tools for Italian</institution>
          ,
          <addr-line>Sep 7 - 8, Parma, IT</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This technical report illustrates the system developed by the CHILab team for the competition HODI at EVALITA 2023. The key idea of the method we proposed for the HODI Subtask A - Homotransphobia detection, was to develop diferent systems arranged as suitable combinations of Pre-Trained Language Model (PTLM) for embedding extraction, neural architectures for further elaborations over the embeddings and a classifier. In particular dense layers, LSTM, BiLSTM and Transformers were used as neural architectures. The best performing system across the ones investigated in this report was made by embeddings extracted via AlBERTo coupled with a Transformer that reaches a macro-F1 score of 0.753.</p>
      </abstract>
      <kwd-group>
        <kwd>homotransphobia detection</kwd>
        <kwd>transformer</kwd>
        <kwd>language model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>content.1</p>
    </sec>
    <sec id="sec-2">
      <title>This paper contains examples of potentially ofensive</title>
      <sec id="sec-2-1">
        <title>2. Introduction</title>
        <p>The increasing interest for gender-inclusive and
nondiscriminatory language passes through its counterpart
in hate speech that it is largely spreading in social
networks, particularly against the LGBTQIA+ community.
The NLP community is currently involved in developing
this respect to derive embeddings. Moreover, we decided
not to use fine-tuning in our PTLMs to stress the use
of light networks to be trained with low computing
resources. Finally, we set up a unique approach for all the
tasks we have participated in EVALITA 2023.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>The paper is arranged as follows: Section 2 reports a description of our systems along with data pre-processing, while results are reported and discussed in Section 3. Concluding remarks are in Section 4.</title>
      <p>3. Description of the system
systems for hate speech detection as in MAMI (Multime- The data set by the HODI organizers contains 6000 Italian
dia Automatic Misogyny Identification) [ 2] and EDOS
(Explainable Detection of Online Sexism) [3] where the
focus was on detection of misogyny and sexism, but these
datasets are focused neither on Italian nor in detecting
tweets annotated accordingly to the presence of
homotransphobic content. Since the training set released for
the competition was made up of 5000 samples, this was
randomly split in a training and validation set, using a
hate speech against people from the LGBTQIA+ commu- 80-20 ratio, resulting in 4000 and 1000 samples
respecnity.</p>
    </sec>
    <sec id="sec-4">
      <title>This paper introduces the architecture proposed by the CHILab team for the EVALITA 2023 campaign [4], and in particular as regards the Homotransphobia Detection in tively.</title>
      <sec id="sec-4-1">
        <title>3.1. Pre-processing</title>
        <p>Italian task (HODI Subtask A - Homotransphobia detec- The [URL] tag, mention references, and retweet notes
tion) [5]. The general approach relies on encoding the
text into suitable word embeddings that are processed via
were removed since they were not considered
meaningful: in particular, mentions are referred to anonymized
neural architectures like LSTM, BiLSTM or Transform- accounts thus they add no special information. This was
ers. Finally, the output classifier detects the presence of
homotransphobic content.
done after an analysis on the most cited words and
hashtags2. As reported in Table 1, the [URL] tag is the most</p>
        <p>We conceived our pipelines as “minimalist” architec- frequent one between classes and adds no information
tures. No generative models [6, 7] where considered in
EVALITA 2023: 8th Evaluation Campaign of Natural Language
htp:/ceur-ws.org
ISN1613-073
© 2023 Copyright for this paper by its authors. Use permitted under Creative</p>
        <p>CEUR</p>
        <p>Workshop Proceedings (CEUR-WS.org)
com/dnozza/profanity-obfuscation) [1]
form
nEvelop-O
just like the anonymized mentions that are not reported.</p>
        <p>During this analysis it was interesting to notice that the
most cited words are slurs directed to LGBTQIA+
members. Although a first idea for approaching the task was
shows clearly that slurs are not a good indicator of
homotransphobic content. Slurs, in fact, are widely used</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>1Profanities have been obfuscated with PrOf (https://github.</title>
    </sec>
    <sec id="sec-6">
      <title>2for this analysis all the words were reported in their lower case</title>
      <p>from the LGBTQIA+ people as a self-definition method
suggesting a (re-)appropriation of the term itself [8], and
obviously tweets of this kind cannot be considered as
homotransphobic so the slur word loses its negative
connotation, as in the tweet:</p>
      <p>ifrmato una fr*cia in sessione:( 3
here the term fr*cia does not have any negative
connotation. Therefore any word-dependent consideration
about the polarization of homotransphobic speeches
cannot be made, as the presence of slur words does not
convey negative content, i.e. slurs cannot be regarded
as representative elements for separating classes. The
same considerations hold for the hashtags as reported in
Table 2 where the most frequent ones are neutral words.</p>
      <p>Hence, the hashtag symbol was removed and the subse- contained in the data set where manually substituted
quent word was kept along with its meaning inside the with the corresponding most frequent emoji. As an
extweet. ample, the “:(” emoticon was translated in “ ” even if</p>
      <p>Similar considerations were made for emojis: also in this is not the exact correspondence. This approach does
this case a strong polarization in the use of emojis is not not inject bias in the data set as the diferent emoticons
found, in particular in the ones that are more associated were very few, while their rough meaning is preserved
with disgust and hate (Table 3). Since emojis are deeply thus avoiding to consider them as mere sequences of
used in social media communication, they were kept. punctuation marks. No further elaboration were made
Based on the statistics reported in Table 3, the emoticons over the tweets: words were not reported to their lower
case form, thus allowing a more accurate extraction of</p>
    </sec>
    <sec id="sec-7">
      <title>3signed by a queer during the exam session</title>
      <p>NH tweets</p>
      <p>freq
culo
rotto
url
gay
c*zzo
r*cchione</p>
      <p>fare
caghino
solo
me
NH tweets
pride
prelemi
eurovision</p>
      <p>gay
tellonym
pridemonth
dazn
meloni
omofobia
escita</p>
      <p>4https://huggingface.co/docs/transformers/index
embeddings for the case-sensitive PTLMs. As for emojis, a Transformer5. The output feature vector has the same
uppercase texts has a specific meaning in social media size of the word embedding with the exception of the
communication in terms of prosodic and emotions inter- BiLSTM that generates a double-length output. Finally,
pretation [9, 10]. the feature vector is passed to a classifier made by either
300 or 768 linear units, depending on the length of the
3.2. Network architectures embedding, and a sigmoidal output to achieve binary
classification (Figure 1.a). Some experimental
configuDiferent models were developed that share the same rations add an extra ReLU dense layer before the
aforemacro structure shown in Figure 1. The key idea was to mentioned classifier with exactly the same size. Those
stress, as much as possible, existent neural architectures architectures are referred as LSTM-Deep, BiLSTM-Deep
for sequence processing, that are LSTM [11], BiLSTM and Trasformer-Deep (Figure 1.b).
and Transformers [12]. Those architecture are used to The illustrated architectures were trained only on the
further process the extracted embeddings. given data set using a machine equipped with two
In</p>
      <p>After pre-processing, the input sentences were padded tel Xeon E5 CPUs 96GB RAM and an NVIDIA TITAN
to ℎ + 2 tokens where ℎ is the size Xp GPU 12GB RAM. Hyperparameters were selected as
of the longest sentence, and the remaining two tokens follows: dropout values in {0.1, 0.2}, batch size 32, Adam
are respectively the [CLS] and the [SEP] one. Either optimizer [18] with learning rate 0.01, and a Binary Cross
a pre-trained language model or a static context-free Entropy loss. Models were trained for a maximum of 1000
embedding model were used for embedding extraction. epochs with a patience value of 50.
In the last case, fastText [13] was used that generates a Diferent feature extractors were implemented using 1,
300 tokens embedding, while a 768 tokens embedding 2 or 3 LSTM/BiLSTM/Transformer layers, but the best
reis obtained as usual by the diferent PTLMs. We used sults were obtained by the single layer feature extraction
the following Encoder-based Language Models in the modules. In addition the developed models are relative
experiments: BERT base multilingual cased [14], BERT small, where the trainable parameters range from 1M to
base italian uncased [15], XLM-RoBERTa [16] and Al- 10M.</p>
      <p>BERTo [17] provided by the HuggingFace Transformers
library4. The embeddings were extracted from the last
layer of the PTLMs without fine-tuning. Fine-tuning in 4. Results
these configuration is an option that is not taken into
account since the main idea is to stress the use of light The best models during the evaluation window6
networks to be trained with low computing resources. were BERT-it/Transformer (run 1),
AlBERTo/Trans</p>
      <p>The extracted sequence of embeddings is fed into a formerDeep (run 2) and AlBERTo/LSTM (run 3) and they
neural module that consists of a LSTM or a BiLSTM or</p>
      <sec id="sec-7-1">
        <title>4.1. Error analysis</title>
        <p>As suggested by the organizers of the shared tasks, an
error analysis was performed particularly on the tweets
that were mis-classified by the models reported in Table 4
that performs better with reference to the baseline. All
classifiers agreed incorrectly on 40 tweets: the 80% of
them were homotransphobic ones. Thanks to a direct
analysis of their content, the following consideration can
be made.</p>
        <p>The very first consideration is that the majority of
the mis-classified tweets contain slurs. As it has been
shown in Section 3.1, slur words are widely used by the
LGBTQIA+ people as self-reference without any
discriminatory intent, so an automatic classifier may not
recognize these shades of meaning as in:
placed ad the bottom of the rank and below the
baseline [5]. Due to an internal error in the code of the
training procedure, the submitted results are intrinsically
wrong and for this reason, we repeated all the
experiments using the correct architecture, after the release
of the golden labels. An overview of all the developed Fanculo Dolce &amp; Gabbana non metto la
models is reported in Table 4, while Table 5 shows the roba fr*cia7
submitted runs, their fixed counterpart and the baseline
value. In both tables the results refer to the F1-macro Moreover, many non-homotransphobic tweets share
score over the test set and, although all possible config- actually some linguistic similarities with the
homourations were run, in Table 4 we report the significant transphoic ones:
architecture, i.e. the configurations that placed above
the baseline. The results show that the AlBERTo/Trans- DI ANORMALE c’è solo che una cripto
former architecture with a two dense layers classifier ch*cca repressa e omofoba quale lei è
#Pil(Transformer-Deep) has the best performance, and it is lon sia miserabile Senatore della
Repubexpected to rank at the 10th place on the leaderboard. blica pagato dagli italiani e che peggio</p>
        <p>Moreover, LSTM-Deep and BiLSTM models exhibit getta discredito sulla nostra Nazione con
comparable performance: bi-directional sequence pro- esternazioni quotidiane di puro,
spregevcessing compensates for the reduced classifier’s capac- ole letame.8
ity. In general, the Transformer-Deep architectures
performed better than the Transformer ones. Here some hateful content is reported towards a person</p>
        <p>As it was expected, only the models based on fastText that is considered homotransphobic. In those cases, the
benefit from removing the stop words. The models using presence of hate speech is correctly detected but it does
AlBERTo and BERT-it achieved almost the best results not meet the homotransphobic requirement.
both in the training phase and in the evaluation, because
the network can take advantage of PTLMs that are
specifically fine-tuned on the target languages. In particular,
AlBERTo was trained on a corpus of Italian tweets that</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>7Fuck Dolce &amp; Gabbana I do not wear f*g stuf</title>
      <p>8ABNORMAL there is only that a repressed and homophobic
crypto queer like you #Pillon is a miserable Senator of the Republic
paid by the Italians and worse, discredits our nation with daily
utterances of pure, despicable manure.</p>
      <sec id="sec-8-1">
        <title>Acknowledgments</title>
        <p>This work is supported by the PO FESR 2014-2020 grant
n. 086201000543, “SCuSi - Smart Culture in Sicily”</p>
      </sec>
      <sec id="sec-8-2">
        <title>5. Conclusion References</title>
        <p>This paper reported the architectures developed by the
CHILab team for HODI Subtask A promoted at the
EVALITA 2023 campaign. Our models show that a
relatively small classical pipeline made by embedding
extraction plus further neural elaboration can have
satisfactory performance in homotransphobic speech detection
without the need of fine-tuning PTLMs, and using few
computational resources. The use of such “minimalist”
architecture is intended to allow for future development
of compact explainable models where explicit
linguistic knowledge is injected in the network to improve its
performance.
doi:10.5281/zenodo.4263142.
[16] A. Conneau, K. Khandelwal, N. Goyal, V.
Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised
crosslingual representation learning at scale, CoRR
abs/1911.02116 (2019). URL: http://arxiv.org/abs/
1911.02116. arXiv:1911.02116.
[17] M. Polignano, P. Basile, M. de Gemmis, G.
Semeraro, V. Basile, AlBERTo: Italian BERT Language
Understanding Model for NLP Challenging Tasks
Based on Tweets, in: Proceedings of the Sixth
Italian Conference on Computational Linguistics
(CLiC-it 2019), volume 2481, CEUR, 2019. URL:
https://www.scopus.com/inward/record.uri?
eid=2-s2.0-85074851349&amp;partnerID=40&amp;md5=
7abed946e06f76b3825ae5e294ffac14.
[18] D. P. Kingma, J. Ba, Adam: A method for
stochastic optimization, arXiv preprint arXiv:1412.6980
(2014).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>