<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>of the Taks A Textual</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Irene Siragusa</string-name>
          <email>irene.siragusa02@unipa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Pirrone</string-name>
          <email>roberto.pirrone@unipa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Ingegneria @ Università degli Studi di Palermo</institution>
          ,
          <addr-line>Viale delle Scienze, Edificio 6 90129 - Palermo</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Processing and Speech Tools for Italian</institution>
          ,
          <addr-line>Sep 7 - 8, Parma, IT</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This technical report illustrates the system developed by the CHILab team for the competition HaSpeeDe3 as part of the EVALITA 2023 campaign. The key idea for HaSpeeDe3 task A - Political Hate Speech Detection - Textual, was to develop diferent systems arranged as suitable combinations of the Pre-Trained Language Model (PTLM) used for embedding extraction, neural architectures for further elaborations over the embeddings and a classifier. In particular, dense layers, LSTM, BiLSTM and Transformers were used. The best performing system across the ones investigated in this report was made by embeddings extracted via XLM-RoBERTa coupled with BiLSTM that reaches a macro-F1 score of 0.876.</p>
      </abstract>
      <kwd-group>
        <kwd>hate speech detection</kwd>
        <kwd>BiLSTM</kwd>
        <kwd>language model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Description of the system</title>
      <p>
        The continuous spread and usage of social media has be- The focus of HaSpeeDe3 was on political and religious
come a problem when dealing with hate online. All social
hate, where strong polarized opinions can be found. The
platforms use artificial intelligence techniques to detect
data set used in this edition for task A is the PolicyCorpus
and report or remove some dangerous contents in terms
XL [11] that contains 7000 tweets annotated manually,
of hate or violence. The interest in this respect is also
and a presence of hate labels above 40%. The training
high in the scientific community, in fact diferent
interdata set was released for the campaign with a total of
national campaigns for detecting hateful speeches have
5600 samples: for developing purposes, the given data
been proposed in recent years: OfensEval [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ],
HatEset was randomly split in a training and validation set,
val [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], HaHackathon [4]. Detection of hateful content in
using a 80-20 ratio, resulting in 4480 and 1120 samples
      </p>
      <sec id="sec-2-1">
        <title>Italian has been addressed by the HaSpeeDe evaluation respectively. competitions [5, 6].</title>
      </sec>
      <sec id="sec-2-2">
        <title>This paper introduces the architecture proposed by the</title>
        <p>CHILab team for the EVALITA 2023 campaign [7], and
in particular as regards the Hate Speech Detection task
(HaSpeeDe3 task A - Political Hate Speech Detection,
textual) [8]. The general approach relies on encoding the
text into suitable word embeddings that are processed via
ers. Finally, the output classifier detects the presence of
hateful content.</p>
        <sec id="sec-2-2-1">
          <title>2.1. Pre-processing</title>
          <p>The [URL] tag, mention references, and retweet notes
were removed since they were not considered
meaningful: in particular, mentions are referred to anonymized
accounts thus they add no special information. This was
tags1. As reported in Table 1, the [URL] tag is the most
frequent one between classes and adds no information
neural architectures like LSTM, BiLSTM or Transform- done after an analysis on the most cited words and
hashtures. No generative models [9, 10] where considered
Overall, no other relevant words appeared that suggest a
in this respect to derive embeddings. Moreover, we de- strong separation between classes. The same
consideracided not to use fine-tuning in our PTLMs to stress the
use of light networks to be trained with low computing
resources. Finally, we set up a unique approach for all
the tasks we have participated in EVALITA 2023.
tions can be done for the hashtags as reported in Table 2.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Although there are some hashtags that are hateful</title>
        <p>(such as salvinipagliaccio, speranzadimettiti and
governodeipeggiori), the most frequent ones are just either
politi</p>
        <p>The paper is arranged as follows: Section 2 reports a de- cians’ or parties’ names, and politics related words, that
scription of our systems along with data pre-processing,
do not express any polarized content. Moreover, since
while results are reported and discussed in Section 3. a strong and significant distinction between hateful and
nEvelop-O
(R. Pirrone)</p>
      </sec>
      <sec id="sec-2-4">
        <title>Concluding remarks are in Section 4.</title>
        <p>form
non-hateful hashtags can be done, their information has
been used as a word inside the tweet, thus keeping the
crucial information, while the hashtag symbol was
re</p>
      </sec>
      <sec id="sec-2-5">
        <title>1for this analysis all the words were reported in their lower case</title>
        <sec id="sec-2-5-1">
          <title>2.2. Network architectures</title>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>Diferent models were developed that share the same</title>
        <p>All freq NH freq H freq macro structure shown in figure 1. The key idea was to
38 21 32 stress, as much as possible, existent neural architectures
32 15 27 for sequence processing, that are LSTM [14], BiLSTM
32 14 25 and Transformers [15]. Those architecture are used to
26 13 18 further process the extracted embeddings.
26 11 15 After pre-processing, the input sentences were padded
26 10 11 to ℎ + 2 tokens where ℎ is the size
22 8 10 of the longest sentence, and the remaining two tokens
15 8 10 are respectively the [CLS] and the [SEP] one. Either
15 8 8 a pre-trained language model or a static context-free
13 7 7 embedding model were used for embedding generation.</p>
        <p>In the last case, fastText [16] was used that generates a
300 tokens embedding, while a 768 tokens embedding</p>
        <p>Similar considerations were made for emojis: also in is obtained as usual by the diferent PTLMs. We used
this case, a strong polarization in the use of emojis did not the following Encoder-based Language Models in the
arise, particularly for the ones that are more associated experiments: BERT base multilingual cased [17], BERT
with disgust and hate (Table 3). Since emojis are deeply base italian uncased [18], XLM-RoBERTa [19] and
Alused in social media communication, they were kept. No BERTo [20] provided by the HuggingFace Transformers
further elaboration were made over the tweets: words library2. The embeddings were extracted from the last
were not reported to their lower case form, thus allowing layer of the PTLMs without fine-tuning. Fine-tuning in
a more accurate extraction of embeddings for the case- these configuration is an option that is not taken into
sensitive PTLMs. As for emojis, uppercase texts has a account since the main idea is to stress the use of light
specific meaning in social media communication in terms
of prosodic and emotions interpretation [12, 13].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>networks to be trained with low computing resources.</p>
      <p>The generated embedding is fed into a module for
feature extraction that consists of a LSTM or a BiLSTM The best F1-macro performances obtained on the test
or a Transformer3. The output feature vector has the set from our models are reported in Table 4. The
subsame size of the word embedding with the exception mitted modes were the best runs with respect to the
of the BiLSTM that generates a double-length output. validation set, namely AlBERTo/BiLSTM (run 1) and
fastFinally, the feature vector is passed to a classifier made Text/BiLSTM (run 2). After the release of the golden
by either 300 or 768 linear units, depending on the length labels, it was possible to measure the actual performance
of the embedding, and a sigmoidal output to achieve of all the developed systems and this shows up that the
binary classification. Some experiments were run by XLM-RoBERTa/BiLSTM architecture gives the best
reinserting a ReLU dense layer before the aforementioned sults, ranking at the 7th place on the leaderboard, while
one with exactly the same size. Those architectures are the submitted runs are at last places as shown in Table 5.
referred as LSTM-Deep, BiLSTM-Deep and Trasformer- Best results are obtained either when using a PTLM
Deep (Figure 1.b). coupled with a LSTM/BiLSTM feature extractor and a
sin</p>
      <p>The illustrated architectures were trained only on the gle dense layer4, while the Transformer based networks
given data set using a machine equipped with two In- exploit better a context-free embedding by using a two
tel Xeon E5 CPUs 96GB RAM and an NVIDIA TITAN layer classifier.</p>
      <p>Xp GPU 12GB RAM. Hyperparameters were selected as It is worth noticing that only the models that use
fastfollows: dropout values in {0.1, 0.2}, batch size 32, Adam Text benefit from removing stopwords, while the PTLMs
optimizer [21] with learning rate 0.01, and a Binary Cross perform almost equally over LSTM and BiLSTM
conEntropy loss. Models were trained for a maximum of 1000 figurations as it was expected. In the training phase,
epochs with a patience value of 50. AlBERTo outperformed the other PTLMs since it uses a</p>
      <p>Diferent feature extractors were implemented using 1, more accurate tokenization compared to the others, and
2 or 3 LSTM/BiLSTM/Transformer layers, but the best re- it takes advantage from its inner knowledge: AlBERTo
sults were obtained by the single layer feature extraction was trained on a corpus of Italian tweets that share the
modules. In addition the developed models are relative same linguistic macro-structure of the PolicyCorpus. On
small, where the trainable parameters range from 1M to the other hand, the best model is the one based on
XLM10M. RoBERTa: this can be caused from its tokenizer that owns
an inner representation for emojis, and consider them as
unique tokens and not as [UKN].</p>
      <sec id="sec-3-1">
        <title>3The corresponding architectures are named according the spe</title>
        <p>cific neural module</p>
        <p>4In table 4 some experiments and configurations are not
reported, like the BiLSTM-Deep one, because they ran bad with respect
to the submitted architectures.</p>
        <sec id="sec-3-1-1">
          <title>3.1. Error analysis</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <sec id="sec-4-1">
        <title>This paper reported the architectures developed by the</title>
        <p>CHILab team for HaSpeeDe3 task A promoted at the
EVALITA 2023 campaign. Our models show that a
relatively small classical pipeline made by embedding
extraction plus further neural elaboration can have good
performance in hate speech detection without the need
of fine-tuning PTLMs, and using few computational
resources. The use of such “minimalist” architecture is
intended to allow for future development of compact
explainable models where explicit linguistic knowledge is
injected in the network to improve its performance.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Besides the aforementioned diferences between the</title>
        <p>PTLMs used, another analysis was made on the mis- Acknowledgments
classified tweets by comparing the results of the best
architectures (AlBERTo/LSTM, AlBERTo/LSTM-Deep, This work is supported by the PO FESR 2014-2020 grant
XLM-RoBERTa and fastText/Transformer-Deep) and the n. 086201000543, “SCuSi - Smart Culture in Sicily”
submitted models. Models agree in mis-classifing 32
tweets, and 25 of them are labeled as hateful.</p>
        <p>None of these mis-classified tweets contain emoji, that References
is their presence or absence is not source of bias in those
models. Moreover, the majority of those tweets contains
hashtags or expressions referring to politicians and
topics of interest in the political debate, that per se are not
hateful. On the contrary, tweets containing the
hashtag speranzadimettiti, considered hateful as in 2, can be
found in non hateful tweets. In those tweets the author
disapproves the governmental behaviour of a minister:
in this case it cannot be considered as hateful since it
express a negative opinion without insulting.</p>
        <p>On the other side, hateful tweets usually contains
profanities and vulgar expressions: hateful tweets that are
not correctly classified by the developed models, lack of
those expressions or put them in an unconventional way
(self-obfuscation or embedded in other words) and this
lead to their mis-classification.
Linguistics, Minneapolis, Minnesota, USA, 2019, tic Society of America 3 (2018) 55–1–13.
pp. 54–63. URL: https://aclanthology.org/S19-2007. URL: https://journals.linguisticsociety.org/
doi:10.18653/v1/S19-2007. proceedings/index.php/PLSA/article/view/4350.
[4] J. A. Meaney, S. Wilson, L. Chiruzzo, A. Lopez, doi:10.3765/plsa.v3i1.4350.</p>
        <p>W. Magdy, SemEval 2021 task 7: HaHackathon, [13] S. Chan, A. Fyshe, Social and emotional correlates
detecting and rating humor and ofense, in: of capitalization on Twitter, in: Proceedings of
Proceedings of the 15th International Workshop the Second Workshop on Computational Modeling
on Semantic Evaluation (SemEval-2021), Associa- of People’s Opinions, Personality, and Emotions
tion for Computational Linguistics, Online, 2021, in Social Media, Association for Computational
pp. 105–119. URL: https://aclanthology.org/2021. Linguistics, New Orleans, Louisiana, USA, 2018,
semeval-1.9. doi:10.18653/v1/2021.semeval-1. pp. 10–15. URL: https://aclanthology.org/W18-1102.
9. doi:10.18653/v1/W18-1102.
[5] C. Bosco, D. Felice, F. Poletto, M. Sanguinetti, [14] S. Hochreiter, J. Schmidhuber, Long
ShortT. Maurizio, et al., Overview of the evalita 2018 Term Memory, Neural Computation 9 (1997)
hate speech detection task, in: Ceur workshop 1735–1780. URL: https://doi.org/10.1162/neco.
proceedings, volume 2263, CEUR, 2018, pp. 1–9. 1997.9.8.1735. doi:10.1162/neco.1997.9.8.1735.
[6] M. Sanguinetti, G. Comandini, E. D. Nuovo,
arXiv:https://direct.mit.edu/neco/articleS. Frenda, M. A. Stranisci, C. Bosco, T. Caselli, pdf/9/8/1735/813796/neco.1997.9.8.1735.pdf.
V. Patti, I. Russo, Haspeede 2 @ evalita2020: [15] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
Overview of the evalita 2020 hate speech detec- L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,
Attion task, EVALITA Evaluation of NLP and Speech tention is all you need, Advances in neural
inforTools for Italian - December 17th, 2020 (2020). mation processing systems 30 (2017).
[7] M. Lai, S. Menini, M. Polignano, V. Russo, R. Sprug- [16] E. Grave, P. Bojanowski, P. Gupta, A. Joulin,
noli, G. Venturi, Evalita 2023: Overview of the 8th T. Mikolov, Learning word vectors for 157
lanevaluation campaign of natural language process- guages, in: Proceedings of the International
Confering and speech tools for italian, in: Proceedings ence on Language Resources and Evaluation (LREC
of the Eighth Evaluation Campaign of Natural Lan- 2018), 2018.
guage Processing and Speech Tools for Italian. Final [17] J. Devlin, M. Chang, K. Lee, K. Toutanova,
Workshop (EVALITA 2023), CEUR.org, Parma, Italy, BERT: pre-training of deep bidirectional
trans2023. formers for language understanding, CoRR
[8] M. Lai, F. Celli, A. Ramponi, S. Tonelli, C. Bosco, abs/1810.04805 (2018). URL: http://arxiv.org/abs/
V. Patti, Haspeede3 at evalita 2023: Overview of the 1810.04805. arXiv:1810.04805.
political and religious hate speech detection task, in: [18] S. Schweter, Italian bert and electra models,
Proceedings of the Eighth Evaluation Campaign of 2020. URL: https://doi.org/10.5281/zenodo.4263142.
Natural Language Processing and Speech Tools for doi:10.5281/zenodo.4263142.</p>
        <p>Italian. Final Workshop (EVALITA 2023), CEUR.org, [19] A. Conneau, K. Khandelwal, N. Goyal, V.
ChaudParma, Italy, 2023. hary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
[9] G. Mialon, R. Dessì, M. Lomeli, C. Nalmpantis, L. Zettlemoyer, V. Stoyanov, Unsupervised
crossR. Pasunuru, R. Raileanu, B. Rozière, T. Schick, lingual representation learning at scale, CoRR
J. Dwivedi-Yu, A. Celikyilmaz, et al., Augmented abs/1911.02116 (2019). URL: http://arxiv.org/abs/
language models: a survey, arXiv preprint 1911.02116. arXiv:1911.02116.</p>
        <p>arXiv:2302.07842 (2023). [20] M. Polignano, P. Basile, M. de Gemmis, G.
Semer[10] Y. Liu, T. Han, S. Ma, J. Zhang, Y. Yang, J. Tian, aro, V. Basile, AlBERTo: Italian BERT Language
H. He, A. Li, M. He, Z. Liu, Z. Wu, D. Zhu, Understanding Model for NLP Challenging Tasks
X. Li, N. Qiang, D. Shen, T. Liu, B. Ge, Sum- Based on Tweets, in: Proceedings of the Sixth
mary of ChatGPT/GPT-4 Research and Perspec- Italian Conference on Computational Linguistics
tive Towards the Future of Large Language Models, (CLiC-it 2019), volume 2481, CEUR, 2019. URL:
2023. URL: http://arxiv.org/abs/2304.01852. doi:10. https://www.scopus.com/inward/record.uri?
48550/arXiv.2304.01852, arXiv:2304.01852 [cs]. eid=2-s2.0-85074851349&amp;partnerID=40&amp;md5=
[11] F. Celli, M. Lai, A. Duzha, C. Bosco, V. Patti, Poli- 7abed946e06f76b3825ae5e294ffac14.
cycorpus xl: An italian corpus for the detection of [21] D. P. Kingma, J. Ba, Adam: A method for
stochashate speech against politics., in: CLiC-it, 2021. tic optimization, arXiv preprint arXiv:1412.6980
[12] M. Heath, Orthography in social media: (2014).</p>
        <p>Pragmatic and prosodic interpretations of
caps lock, Proceedings of the
Linguis</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Farra</surname>
          </string-name>
          , R. Kumar, SemEval
          <article-title>-2019 task 6: Identifying and categorizing ofensive language in social media (OfensEval)</article-title>
          ,
          <source>in: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Minneapolis, Minnesota, USA,
          <year>2019</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>86</lpage>
          . URL: https://aclanthology.org/S19-2010. doi:
          <volume>10</volume>
          .18653/ v1/
          <fpage>S19</fpage>
          - 2010.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          , G. Karadzhov,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z</surname>
          </string-name>
          . Pitenis, c. Çöltekin, SemEval-2020
          <source>Task</source>
          <volume>12</volume>
          :
          <article-title>Multilingual Offensive Language Identification in Social Media (OffensEval 2020)</article-title>
          , in: Proceedings of SemEval,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Rangel Pardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Sanguinetti, SemEval2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter</article-title>
          ,
          <source>in: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          , Association for Computational
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>