<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the EVALITA 2018 Hate Speech Detection Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cristina Bosco</string-name>
          <email>bosco@di.unito.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felice Dell'Orletta Fabio Poletto</string-name>
          <email>fabio.poletto@edu.unito.it</email>
          <email>felice.dellorletta@ilc.cnr.it</email>
          <email>felice.dellorletta@ilc.cnr.it fabio.poletto@edu.unito.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuela Sanguinetti</string-name>
          <email>msanguin@di.unito.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Tesconi</string-name>
          <email>maurizio.tesconi@iit.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IIT-CNR</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ILC-CNR</institution>
          ,
          <addr-line>Pisa Acmos, Torino</addr-line>
          ,
          <country>Italy Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Torino</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. The Hate Speech Detection (HaSpeeDe) task is a shared task on Italian social media (Facebook and Twitter) for the detection of hateful content, and it has been proposed for the first time at EVALITA 2018. Providing two datasets from two different online social platforms differently featured from the linguistic and communicative point of view, we organized the task in three tasks where systems must be trained and tested on the same resource or using one in training and the other in testing: HaSpeeDe-FB, HaSpeeDeTW and Cross-HaSpeeDe (further subdivided into Cross-HaSpeeDe FB and Cross-HaSpeeDe TW sub-tasks). Overall, 9 teams participated in the task, and the best system achieved a macro F1score of 0.8288 for HaSpeeDe-FB, 0.7993 for HaSpeeDe-TW, 0.6541 for CrossHaSpeeDe FB and 0.6985 for CrossHaSpeeDe TW. In this report, we describe the datasets released and the evaluation measures, and we discuss results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. HaSpeeDe e` la prima
campagna di valutazione di sistemi per
l’identificazione automatica di discorsi
di incitamento all’odio su social media
(Facebook e Twitter) in lingua italiana,
proposta nell’ambito di EVALITA 2018.
Fornendo ai partecipanti due insiemi di
dati estratti da due piattaforme differenti
dal punto di vista linguistico e della
comunicazione, abbiamo articolato HaSpeeDe
in tre compiti in cui i sistemi sono
addestrati e testati sulla stessa tipologia
di dati oppure addrestrati su una
tipologia e testati sull’altra: HaSpeeDe-FB,
HaSpeeDe-TW e Cross-HaSpeeDe (a sua
volta suddiviso in Cross-HaSpeeDe FB e
Cross-HaSpeeDe TW). Nel complesso, 9
gruppi hanno partecipato alla campagna,
e il miglior sistema ha ottenuto un
punteggio di macro F1 pari a 0,8288 in
HaSpeeDe-FB, 0,7993 in HaSpeeDe-TW,
0,6541 in Cross-HaSpeeDe FB e 0.6985
in Cross-HaSpeeDe TW. L’articolo
descrive i dataset rilasciati e le modalita` di
valutazione, e discute i risultati ottenuti.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction and Motivations</title>
      <p>Online hateful content, or Hate Speech (HS), is
characterized by some key aspects (such as
virality, or presumed anonymity) which distinguish it
from offline communication and make it
potentially more dangerous and hurtful. Therefore, its
identification becomes a crucial mission in many
fields.</p>
      <p>The task that we have proposed for this
edition of EVALITA namely consists in automatically
annotating messages from two popular
microblogging platforms, Twitter and Facebook, with a
boolean value indicating the presence (or not) of
HS.</p>
      <p>
        HS can be defined as any expression “that is
abusive, insulting, intimidating, harassing, and/or
incites to violence, hatred, or discrimination. It is
directed against people on the basis of their race,
ethnic origin, religion, gender, age, physical
condition, disability, sexual orientation, political
conviction, and so forth”
        <xref ref-type="bibr" rid="ref14">(Erjavec and Kovacˇicˇ, 2012)</xref>
        .
      </p>
      <p>Although definitions and approaches to HS vary
a lot and depend on the juridical tradition of the
country, many agree that what is identified as
such can not fall under the protection granted
by the right to freedom of expression, and must
be prohibited. Also for transposing in practical
initiatives the Code of Conduct of the European
Union1, online platforms like Twitter, Facebook
or YouTube discourage hateful content, but its
removal mainly relies on users and trusted flaggers
reports, and lacks a systematic control.</p>
      <p>Although HS analysis and identification
requires a multidisciplinary approach that includes
knowledge from different fields (psychology, law,
social sciences, among others), NLP plays a
fundamental role in this respect. Therefore, the
development of high-accuracy automatic tools able
to identify HS assumes the utmost relevance not
only for NLP – and Italian NLP in particular –
but also for all the practical applications a
similar task lends itself to. Furthermore, as also
suggested in Schmidt and Wiegand (2017), the
community would considerably benefit from a
benchmark dataset for HS detection underlying a
commonly accepted definition of the task.</p>
      <p>
        As regards the state of the art, a large number
of contributions have been proposed on this topic,
that adopt from lexicon-based
        <xref ref-type="bibr" rid="ref19">(Gitari et al., 2015)</xref>
        to various machine learning approaches, and with
different learning techniques, ranging from na¨ıve
Bayes classifiers
        <xref ref-type="bibr" rid="ref21">(Kwok and Wang, 2013)</xref>
        ,
Logistic Regression and Support Vector Machines
        <xref ref-type="bibr" rid="ref11 ref5">(Burnap and Williams, 2015; Davidson et al., 2017)</xref>
        ,
to the more recent Recurrent and Convolutional
Neural Networks
        <xref ref-type="bibr" rid="ref18 ref22 ref24 ref29">(Mehdad and Tetreault, 2016;
Gamba¨ck and Sikdar, 2017)</xref>
        . However, there exist
no comparative studies which would allow making
judgement on the most effective learning method
        <xref ref-type="bibr" rid="ref18 ref24 ref29">(Schmidt and Wiegand, 2017)</xref>
        .
      </p>
      <p>
        Furthermore, a large number of academic events
and shared tasks took place in the recent past,
thus reflecting the interest in HS and HS-related
topics by the NLP community; to name a few,
the first and second edition of the Workshop on
Abusive Language2
        <xref ref-type="bibr" rid="ref31">(Waseem et al., 2017)</xref>
        , the
First Workshop on Trolling, Aggression and
Cyberbullying
        <xref ref-type="bibr" rid="ref20">(Kumar et al., 2018)</xref>
        , that also
included a shared task on aggression
identification, the tracks on Automatic Misogyny
Identification (AMI)
        <xref ref-type="bibr" rid="ref15 ref16">(Fersini et al., 2018b)</xref>
        and on
auto1On May 31, 2016, the EU Commission presented with
Facebook, Microsoft, Twitter and YouTube a “Code of
conduct on countering illegal hate speech online”.
      </p>
      <p>
        2https://sites.google.com/view/
alw2018/
horship and aggressiveness analysis (MEX-A3T)
        <xref ref-type="bibr" rid="ref6">(Carmona et al., 2018)</xref>
        proposed at the 2018
edition of IberEval, the GermEval Shared Task on the
Identification of Offensive Language
        <xref ref-type="bibr" rid="ref32">(Wiegand et
al., 2018)</xref>
        , the Automatic Misogyny Identification
task at EVALITA 2018
        <xref ref-type="bibr" rid="ref12 ref15 ref16 ref6">(Fersini et al., 2018a)</xref>
        , and
finally the SemEval shared task on hate speech
detection against immigrants and women (HatEval),
that is still ongoing at the time of writing3.
      </p>
      <p>
        On the other hand, such contributions and
events are mainly based on other languages
(English, for most part), while very few of them deal
with Italian
        <xref ref-type="bibr" rid="ref13 ref23 ref24">(Del Vigna et al., 2017; Musto et
al., 2016; Pelosi et al., 2017)</xref>
        . Precisely for this
reason, the Hate Speech Detection (HaSpeeDe)4
task has been conceived and proposed within the
EVALITA context
        <xref ref-type="bibr" rid="ref7">(Caselli et al., 2018)</xref>
        ; its
purpose is namely to encourage and promote the
participation of several research groups, both from
academia and industry, making a shared dataset
available, in order to allow an advancement in the
state of the art in this field for Italian as well.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Task Organization</title>
      <p>Considering the linguistic, as well as
metalinguistic, features that distinguish Twitter and
Facebook posts, namely due to the differences in
use between the two platforms and the character
limitations posed for their messages (especially on
Twitter), the task has been further organized into
three sub-tasks, based on the dataset used (see
Section 3):</p>
      <sec id="sec-3-1">
        <title>Task 1: HaSpeeDe-FB, where only the</title>
        <p>Facebook dataset could be used to classify
the Facebook test set</p>
      </sec>
      <sec id="sec-3-2">
        <title>Task 2: HaSpeeDe-TW, where only the</title>
        <p>Twitter dataset could be used to classify the
Twitter test set
Task 3: Cross-HaSpeeDe, which has been
further subdivided into two sub-tasks:
– Task 3.1: Cross-HaSpeeDe FB, where
only the Facebook dataset could be used
to classify the Twitter test set
– Task 3.2: Cross-HaSpeeDe TW,
where, conversely, only the Twitter
3https://competitions.codalab.org/
competitions/19935</p>
        <p>4http://www.di.unito.it/˜tutreeb/
haspeede-evalita18/
dataset could be used to classify the
Facebook test set</p>
        <p>Cross-HaSpeeDe, in particular, has been
proposed as an out-of-domain task that specifically
aimed on one hand at highlighting the
challenging aspects of using social media data for
classification purposes, and on the other at enhancing
the systems’ ability to generalize their predictions
with different datasets.
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Datasets and Format</title>
      <p>The datasets proposed for this task are the result of
a joint effort of two research groups on
harmonizing the annotation previously applied to two
different datasets, in order to allow their exploitation
in the task.</p>
      <p>
        The first dataset is a collection of Facebook
posts developed by the group from Pisa and
created in 2016
        <xref ref-type="bibr" rid="ref13">(Del Vigna et al., 2017)</xref>
        , while the
other one is a Twitter corpus developed in
20172018 by the Turin group
        <xref ref-type="bibr" rid="ref27">(Sanguinetti et al., 2018)</xref>
        .
Section 3.1 and 3.2 briefly introduce the original
datasets, while Section 3.3 describes the unified
annotation scheme adopted in both corpora for the
purposes of this task.
3.1
      </p>
      <sec id="sec-4-1">
        <title>Facebook Dataset</title>
        <p>This is a corpus of comments retrieved from
the Facebook public pages of Italian newspapers,
politicians, artists, and groups. Those pages were
selected because typically they host discussions
spanning across a variety of topics.</p>
        <p>The comments collected were related to a series
of web pages and groups, chosen as being
suspected to possibly contain hateful content:
salviniofficial, matteorenziufficiale, lazanzarar24,
jenusdinazareth, sinistracazzateliberta2,
ilfattoquotidiano, emosocazzi, noiconsalviniufficiale.</p>
        <p>Overall, 17,567 Facebook comments were
collected from 99 posts crawled from the selected
pages. Five bachelor students were asked to
annotate comments, in particular 3,685 received at
least 3 annotations. The annotators were asked to
assign one class to each post, where classes span
over the following levels of hate: No hate, Weak
hate, Strong hate.</p>
        <p>
          Hateful messages were then divided into distinct
categories: Religion, Physical and/or mental
handicap, Socio-economical status, Politics, Race, Sex
and Gender issues, and Other.
3.2
The Twitter dataset released for the competition
is a subset of a larger hate speech corpus
developed at the Turin University. The corpus forms
indeed part of the Hate Speech Monitoring
program5, coordinated by the Computer Science
Department with the aim at detecting, analyzing and
countering HS with an inter-disciplinary approach
          <xref ref-type="bibr" rid="ref4">(Bosco et al., 2017)</xref>
          . Its preliminary stage of
development has been described in Poletto et al. (2017),
while the fully developed corpus is described in
Sanguinetti et al. (2018).
        </p>
        <p>The collection includes Twitter posts gathered
with a classical keyword-based approach, more
specifically by filtering the corpus using neutral
keywords related to three social groups deemed as
potential HS targets in the Italian context:
immigrants, Muslims and Roma.</p>
        <p>After a first annotation step that resulted in a
collection of around 1,800 tweets, the corpus has
been further expanded by adding new annotated
data. The newly introduced tweets were annotated
partly by experts and partly by CrowdFlower (now
Figure Eight) contributors. The final version of the
corpus consists of 6,928 tweets.</p>
        <p>The main feature of this corpus is its annotation
scheme, specifically designed to properly encode
the multiplicity of factors that can contribute to
the definition of a hate speech notion, and to
offer a broader tagset capable of better representing
all those factors which may increase, or rather
mitigate, the impact of the message. This resulted in
a scheme that includes, besides HS tags (no-yes),
also its intensity degree (from 1 through 4 if HS is
present, and 0 otherwise), the presence of
aggressiveness (no-weak-strong) and offensiveness
(noweak-strong), as well as irony and stereotype
(noyes).</p>
        <p>
          In addition, given that irony has been included
as annotation category in the scheme, part of
this hate speech corpus (i.e. the tweets
annotated as ironic) has also been used in
another task proposed in this edition of EVALITA,
namely the one on irony detection in Italian tweets
(IronITA)6
          <xref ref-type="bibr" rid="ref8">(Cignarella et al., 2018)</xref>
          . More
precisely, the overlapping tweets in the IronITA
datasets are 781 in the training set and just 96 in
the test set.
        </p>
        <p>5http://hatespeech.di.unito.it/
6http://www.di.unito.it/˜tutreeb/
ironita-evalita18/
The annotation format provided for the task is
the same for both datasets described above, and
it consists of a simplified version of the schemes
adopted in the two corpora introduced in Section
3.1 and 3.2.</p>
        <p>The data have been encoded in UTF-8 plain-text
files with three tab-separated columns, each one
representing the following information:
1. the ID of the Facebook comment or tweet7,
2. the text,
3. the class: 1 if the text contains HS, and 0
otherwise (see Table 1 and 2 for a few
examples).</p>
        <p>Both Facebook and Twitter datasets consist of a
total amount of 4,000 comments/tweets retrieved
from the main corpora introduced in Section 3.1
and 3.2. The data were randomly split into
development and test set, of 3,000 and 1,000 messages
respectively.</p>
        <p>The distribution in both datasets of the labels
expressing the presence or not of HS is summarized
in Table 3 and 4.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>Participants were allowed to submit up to 2 runs
for each task, and a separate official ranking has
7In order to meet the GDPR requirements, texts have been
pseudonymized replacing all original IDs in both datasets
with newly-generated ones.</p>
      <sec id="sec-5-1">
        <title>Train Test total</title>
      </sec>
      <sec id="sec-5-2">
        <title>Train Test total</title>
        <p>been provided.</p>
        <p>The evaluation has been performed according to
the standard metrics known in literature, i.e
Precision, Recall and F1-score. However, given the
imbalanced distribution of hateful vs not hateful
messages, and in order to get more useful insights
on the system’s performance on a given class,
the scores have been computed for each class
separately; finally the F1-score has been
macroaveraged, so as to get the overall results.</p>
        <p>For all tasks, the baseline score has been
computed as the performance of a classifier based on
the most frequent class.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Overview of the Task: Participation and Results</title>
      <p>5.1</p>
      <sec id="sec-6-1">
        <title>Task Participants and Submissions</title>
        <p>A total amount of 9 teams8 participated in at least
one of the three HaSpeeDe main tasks. Table 5
provides an overview of the teams and their
affiliation.</p>
        <p>Except for one case, where one run was sent for
HaSpeeDe-TW only, all teams submitted at least
one run for all the tasks.
5.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Systems</title>
        <p>As participants were allowed to submit up to 2
runs for each task, several training options were
adopted in order to properly classify the texts.
Furthermore, unlike other tasks, we have
chosen to not establish any distinction between
constrained and unconstrained runs, and to allow
participants to use all the additional resources that
8In fact, 11 teams submitted their results, but one team
withdrew its submissions, and another one’s submissions
have been removed from the official rankings by the task
organizers.</p>
      </sec>
      <sec id="sec-6-3">
        <title>Team Affiliation</title>
        <p>GRCP Univ. Polite`cnica de Vale`ncia +</p>
        <p>CERPAMID, Cuba
InriaFBK Univ. Coˆte d’Azur, CNRS, Inria +</p>
        <p>FBK, Trento
ItaliaNLP ILC-CNR, Pisa + Univ. of Pisa
Perugia Univ. for Foreigners of Perugia +</p>
        <p>Univ. of Perugia + Univ. of Florence
RuG University of Groningen +</p>
        <p>Univ. degli Studi di Salerno
sbMMP Zurich Univ. of Applied Sciences
StopPropagHate INESC TEC + Univ. of Porto +</p>
        <p>Eurecat, Centre Tecn. de Catalunya
HanSEL University of Bari Aldo Moro
VulpeculaTeam University of Perugia
they deemed useful for the task (other annotated
resources, lexicons, pre-trained word embeddings,
etc.), on the sole condition that these were
explicitly mentioned in their final report.</p>
        <p>
          Table 6 summarizes the external resources (if
any) used by participants to enhance their systems’
performance, while the remainder of this section
offers a brief overview of the teams’ systems and
core methods adopted to participate in the task .
GRCP
          <xref ref-type="bibr" rid="ref12 ref6 ref9">(De la Pen˜a Sarrace´n et al., 2018)</xref>
          The
authors proposed a bidirectional Long
ShortTerm Memory Recurrent Neural Network with an
Attention-based mechanism that allows to
estimate the importance of each word; this context
vector is then used with another LSTM model to
estimate whether a text is hateful or not.
HanSEL
          <xref ref-type="bibr" rid="ref15 ref16 ref17 ref26 ref3 ref32 ref7 ref8 ref9">(Polignano and Basile, 2018)</xref>
          The
system proposed is based on an ensemble of three
classification strategies, mediated by a majority
vote algorithm: Support Vector Machine with
RBF kernel, Random Forest and Deep Multilayer
Perceptron. The input social media text is
represented as a concatenation of word2vec sentence
vectors and a TF-IDF bag of words.
        </p>
        <p>
          InriaFBK
          <xref ref-type="bibr" rid="ref10">(Corazza et al., 2018)</xref>
          The authors
implemented three different classifier models,
based on recurrent neural networks, n-gram based
models and linear SVC.
        </p>
        <p>
          ItaliaNLP
          <xref ref-type="bibr" rid="ref9">(Cimino et al., 2018)</xref>
          Participants
tested three different classification models: one
based on linear SVM, another one based on a
1layer BiLSTM and a newly-introduced one based
on a 2-layer BiLSTM which exploits multi-task
learning with additional data from the 2016
SENTIPOLC task
          <xref ref-type="bibr" rid="ref2">(Barbieri et al., 2016)</xref>
          .
        </p>
        <p>
          Perugia
          <xref ref-type="bibr" rid="ref28">(Santucci et al., 2018)</xref>
          The participants’
system uses a document classifier based on a SVM
algorithm. The features used by the system are
a combination of features extracted using
mathematical operations on FastText word embeddings
and other 20 features extracted from the raw text.
RuG
          <xref ref-type="bibr" rid="ref1">(Bai et al., 2018)</xref>
          The authors proposed
two different classifiers: a SVM based on linear
kernel algorithm and an ensemble system
composed of a SVM classifier and a Convolutional
Neural Network combined by a logistic
regression meta-classifier. The features of each
classifier is algorithm dependent and exploits word
embeddings, raw text features and lexical resources
features.
sbMMMP The authors tested two different
systems, in a similar fashion to what described in von
Gru¨ningen et al. (2018). The first one is based
on an ensemble of Convolutional Neural Networks
(CNN), whose outputs are then used as features
by a meta-classifier for the final prediction. The
second system uses a combination of a CNN and
a Gated Recurrent Unit (GRU) together with a
transfer-learning approach based on pre-training
with a large, automatically-translated dataset.
StopPropagHate
          <xref ref-type="bibr" rid="ref17">(Fortuna et al., 2018)</xref>
          The
authors use a classifier based on Recurrent Neural
Networks with a binary cross-entropy as loss
function. In their system, each input word is
represented by a 10000-dimensional vector which is a
one-hot encoding vector.
        </p>
        <p>
          VulpeculaTeam
          <xref ref-type="bibr" rid="ref3">(Bianchini et al., 2018)</xref>
          According to the description provided by
participants, a neural network with three hidden layers
was used, with word embeddings trained on a set
of previously extracted Facebook comments.
5.3
        </p>
      </sec>
      <sec id="sec-6-4">
        <title>Results and Discussion</title>
        <p>In Table 7, 8, 9 and 10, we report the final results
of HaSpeeDe, separated according to the
respective sub-task and ranked by the macro F1-score (as
described in Section 4)9.</p>
        <p>9Due to space constraints, the complete evaluation for all
classes has been made available here: https://goo.gl/
xPyPRW</p>
      </sec>
      <sec id="sec-6-5">
        <title>Team</title>
        <p>GRCP
InriaFBK
ItaliaNLP Lab
Perugia
RuG
sbMMP
StopPropagHate
HanSEL
VulpeculaTeam</p>
      </sec>
      <sec id="sec-6-6">
        <title>External Resources</title>
        <p>pre-trained word embeddings
emotion lexicon
polarity and subjectivity lexicons + 2 word-embedding lexicons
Twitter corpus + hate speech lexicon + polarity lexicon
pre-trained word embeddings + bad/offensive word lists
pre-trained word embeddings
–
pre-trained word embeddings
polarity lexicon + lists of bad words + pre-trained word embeddings</p>
        <p>In case of multiple runs, the suffixes ” 1” and ” 2”
have been appended to each team name, in order
to distinguish the run number of the submitted file.</p>
        <p>Furthermore, some of the runs in the tables have
been marked with *: this means that they were
resubmitted because of file incompatibility with the
evaluation script or other minor issues that did not
affect the evaluation process.</p>
      </sec>
      <sec id="sec-6-7">
        <title>Team</title>
        <p>baseline
ItaliaNLP 2
ItaliaNLP 1
InriaFBK 1
InriaFBK 2
Perugia 2
RuG 1
HanSEL
VulpeculaTeam*
RuG 2
GRCP 2
GRCP 1
StopPropagHate 2*
StopPropagHate 1*
Perugia 1</p>
        <p>In absolute terms, i.e. based on the score
of the first-ranked team, the best results have
been achieved in the HaSpeeDe-FB task, with
a macro F1 of 0.8288, followed by
HaSpeeDeTW (0.7993), Cross-HaSpeeDe TW (0.6985) and
Cross-HaSpeeDe FB (0.6541).</p>
        <p>The robustness of an approach benefiting from
a polarity and subjectivity lexicon is confirmed
by the fact that the best ranking team in both</p>
      </sec>
      <sec id="sec-6-8">
        <title>Team</title>
        <p>baseline
ItaliaNLP 2
ItaliaNLP 1
RuG 1
InriaFBK 2
sbMMMP
InriaFBK 1
VulpeculaTeam*
Perugia 2
RuG 2
StopPropagHate 2*
StopPropagHate 1*
GRCP 1
GRCP 2
HanSEL
Perugia 1
HaSpeeDe-FB and HaSpeeDe-TW, i.e. ItaliaNLP,
also achieved valuable results in the cross-domain
sub-tasks, ranking at fifth and first position in
Cross-HaSpeeDe FB and Cross-HaSpeeDe TW,
respectively. But these results can also depend on
the association of the polarity and subjectivity
lexicon with word embeddings, which alone did not
allow the achievement of particularly high results.</p>
        <p>Furthermore, it is not surprising that the best
results have been obtained on HaSpeeDe-FB,
provided the fact that messages posted on this
platform are longer and more correct than those in
Twitter, allowing systems (and humans too) to find
more and more clear indications of the presence of
HS.</p>
        <p>The coarse granularity of the annotation scheme,</p>
      </sec>
      <sec id="sec-6-9">
        <title>Team</title>
        <p>baseline
InriaFBK 2
InriaFBK 1
VulpeculaTeam
Perugia 2
ItaliaNLP 1
ItaliaNLP 2
GRCP 2
RuG 1
RuG 2
GRCP 1
HanSEL
StopPropagHate
Perugia 1
which is a simplification of the schemes originally
proposed for the datasets, and merged specifically
for the purpose of this task, probably influenced
the scores which are indeed very promising and
high with respect to other tasks of the sentiment
analysis area.</p>
        <p>As regards the Cross-HaSpeeDe FB and
CrossHaSpeeDe TW sub-tasks, the lower results with
respect to the in-domain tasks can be attributed
to several factors, among which - and as expected
- the different distribution in Facebook and
Twitter datasets of HS and not HS classes. As a
matter of fact, the percentage of HS in the Facebook
train and test set is around 46% and 68%,
respectively, while in the Twitter test set is around 32%
in both sets. Such imbalanced distribution is
reflected in the overall system outputs in the two
sub-tasks: in Cross-HaSpeeDe FB, where systems
have been evaluated against the Twitter test set,
most of the labels predicted as HS were not
classified as such in the gold standard; conversely, in
Cross-HaSpeeDe TW, the majority of labels
predicted as not HS were actually considered as HS
in the gold corpus.</p>
        <p>Another feature that distinguishes Facebook from
Twitter dataset is the wider range of hate
categories in the former, compared to the latter
(see Section 3.1 and 3.2). Especially in
CrossHaSpeeDe TW, the identification of hateful
messsages may have been made even more difficult due
to the reduced number of potential hate targets in
the training set, with respect to the test set.</p>
        <p>Overall, the heterogeneous nature of the
datasets provided for the task - both in terms of
class distribution and data composition - together
with their quite small size, made the whole task
even more challenging; nonetheless, this did not
prevent participants from finding the appropriate
solutions, thus improving the state of the art for
HS identification in Italian language as well.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Closing Remarks</title>
      <p>The paper describes the HaSpeeDe task for the
detection of HS in Italian texts from Facebook and
Twitter. The novelty of the task mainly consists
in allowing the comparison between the results
obtained on the two platforms and experiments
on training on one typology of texts and testing
on the other. The results confirmed the difficulty
of cross-platform HS detection but also produced
very promising scores in the tasks where the data
from the same social network were exploited both
for training and testing.</p>
      <p>Future work can be devoted to an in-depth
analysis of errors and to the observation of the
contribution that different resources can give to systems
performing this task.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The work of Cristina Bosco and Manuela
Sanguinetti is partially funded by Progetto di
Ateneo/CSP 2016 (Immigrants, Hate and Prejudice
in Social Media, S1618 L2 BOSC 01).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Xiaoyu</given-names>
            <surname>Bai</surname>
          </string-name>
          , Flavio Merenda, Claudia Zaghi, Tommaso Caselli, and
          <string-name>
            <given-names>Malvina</given-names>
            <surname>Nissim</surname>
          </string-name>
          .
          <year>2018</year>
          . RuG @ EVALITA 2018:
          <article-title>Hate Speech Detection In Italian Social Media</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Valerio Basile, Danilo Croce, Malvina Nissim, Nicole Novielli, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the Evalita 2016 SENTIment POLarity Classification Task</article-title>
          .
          <source>In Proceedings of the Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Giulio</given-names>
            <surname>Bianchini</surname>
          </string-name>
          , Lorenzo Ferri, and
          <string-name>
            <given-names>Tommaso</given-names>
            <surname>Giorni</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Text Analysis for Hate Speech Detection in Italian Messages on Twitter and Facebook</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Patti Viviana, Marcello Bogetti, Michelangelo Conoscenti, Giancarlo Ruffo, Rossano Schifanella, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Stranisci</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Tools and Resources for Detecting Hate and Prejudice Against Immigrants in Social Media</article-title>
          .
          <source>In Proceedings of First Symposium on Social Interactions in Complex Intelligent Systems (SICIS)</source>
          ,
          <source>AISB Convention</source>
          <year>2017</year>
          ,
          <article-title>AI</article-title>
          and Society.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Pete</given-names>
            <surname>Burnap and Matthew L. Williams</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making</article-title>
          .
          <source>Policy &amp; Internet</source>
          ,
          <volume>7</volume>
          (
          <issue>2</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Miguel</surname>
          </string-name>
          <article-title>A´ ngel A´ lvarez Carmona, Estefan´ıa Guzma´nFalco´n, Manuel Montes-y-Go´mez, Hugo Jair Escalante, Luis Villasen˜or Pineda, Vero´nica ReyesMeza</article-title>
          , and Antonio Rico Sulayes.
          <year>2018</year>
          .
          <article-title>Overview of MEX-A3T at IberEval 2018: Authorship and Aggressiveness Analysis in Mexican Spanish Tweets</article-title>
          .
          <source>In IberEval@SEPLN. CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Tommaso</given-names>
            <surname>Caselli</surname>
          </string-name>
          , Nicole Novielli, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>EVALITA 2018: Overview of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Alessandra</given-names>
            <surname>Teresa</surname>
          </string-name>
          <string-name>
            <surname>Cignarella</surname>
          </string-name>
          , Simona Frenda, Valerio Basile, Cristina Bosco, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the Evalita 2018 Task on Irony Detection in Italian Tweets (IronITA)</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Cimino</surname>
          </string-name>
          , Lorenzo De Mattei, and Felice Dell'Orletta.
          <year>2018</year>
          .
          <article-title>Multi-task Learning in Deep Neural Networks at EVALITA 2018</article-title>
          .
          <article-title>In Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18), Turin, Italy</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Michele</given-names>
            <surname>Corazza</surname>
          </string-name>
          , Stefano Menini, Pinar Arslan, Rachele Sprugnoli, Elena Cabrio, Sara Tonelli, and
          <string-name>
            <given-names>Serena</given-names>
            <surname>Villata</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Comparing Different Supervised Approaches to Hate Speech Detection</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Davidson</surname>
          </string-name>
          , Dana Warmsley,
          <string-name>
            <given-names>Michael W.</given-names>
            <surname>Macy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ingmar</given-names>
            <surname>Weber</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Automated Hate Speech Detection and the Problem of Offensive Language</article-title>
          . CoRR, abs/1703.04009.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Gretel Liz De la Pen</surname>
          </string-name>
          <article-title>˜a Sarrace´n, Reynaldo Gil Pons, Carlos Enrique Mun˜iz Cuza, and</article-title>
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Hate Speech Detection Using Attentionbased LSTM</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Fabio Del Vigna</surname>
            ,
            <given-names>Andrea</given-names>
          </string-name>
          <string-name>
            <surname>Cimino</surname>
            , Felice Dell'Orletta,
            <given-names>Marinella</given-names>
          </string-name>
          <string-name>
            <surname>Petrocchi</surname>
            , and
            <given-names>Maurizio</given-names>
          </string-name>
          <string-name>
            <surname>Tesconi</surname>
          </string-name>
          .
          <year>2017</year>
          . Hate Me, Hate Me Not:
          <article-title>Hate Speech Detection on Facebook</article-title>
          .
          <source>In Proceedings of the First Italian Conference on Cybersecurity (ITASEC17).</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Karmen</given-names>
            <surname>Erjavec</surname>
          </string-name>
          and Melita Poler Kovacˇicˇ.
          <year>2012</year>
          . ”You
          <string-name>
            <surname>Don't Understand</surname>
          </string-name>
          ,
          <article-title>This is a New War!” Analysis of Hate Speech in News Web Sites' Comments</article-title>
          . Mass Communication and Society,
          <volume>15</volume>
          (
          <issue>6</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Debora Nozza, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          . 2018a.
          <article-title>Overview of the EVALITA 2018 Task on Automatic Misogyny Identification (AMI)</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Paolo Rosso, and
          <string-name>
            <given-names>Maria</given-names>
            <surname>Anzovino</surname>
          </string-name>
          . 2018b.
          <article-title>Overview of the Task on Automatic Misogyny Identification at IberEval 2018</article-title>
          .
          <article-title>In IberEval@SEPLN</article-title>
          .
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Paula</given-names>
            <surname>Fortuna</surname>
          </string-name>
          , Ilaria Bonavita, and Se´rgio Nunes.
          <year>2018</year>
          .
          <article-title>Merging datasets for hate speech classification in Italian</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>Bjo¨rn Gamba¨ck and Utpal Kumar Sikdar</article-title>
          .
          <year>2017</year>
          .
          <article-title>Using Convolutional Neural Networks to Classify HateSpeech</article-title>
          .
          <source>In Proceedings of the First Workshop on Abusive Language.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Njagi</given-names>
            <surname>Dennis Gitari</surname>
          </string-name>
          , Zhang Zuping, Hanyurwimfura Damien, and
          <string-name>
            <given-names>Jun</given-names>
            <surname>Long</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A lexicon-based approach for hate speech detection</article-title>
          .
          <source>International Journal of Multimedia and Ubiquitous Engineering</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Ritesh</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Atul</given-names>
            <surname>Kr</surname>
          </string-name>
          . Ojha, Marcos Zampieri, and Shervin Malmasi, editors.
          <source>2018. Proceedings of the First Workshop on Trolling</source>
          ,
          <article-title>Aggression and Cyberbullying (TRAC-</article-title>
          <year>2018</year>
          ).
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Irene</given-names>
            <surname>Kwok</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yuzhou</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Locate the Hate: Detecting Tweets Against Blacks</article-title>
          .
          <source>In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence</source>
          . AAAI Press.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Yashar</given-names>
            <surname>Mehdad</surname>
          </string-name>
          and
          <string-name>
            <given-names>Joel</given-names>
            <surname>Tetreault</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Do Characters Abuse More Than Words? In 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Cataldo</given-names>
            <surname>Musto</surname>
          </string-name>
          , Giovanni Semeraro, Marco de Gemmis, and
          <string-name>
            <given-names>Pasquale</given-names>
            <surname>Lops</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Modeling Community Behavior through Semantic Analysis of Social Data: The Italian Hate Map Experience</article-title>
          .
          <source>In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization</source>
          ,
          <string-name>
            <surname>UMAP</surname>
          </string-name>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Serena</given-names>
            <surname>Pelosi</surname>
          </string-name>
          , Alessandro Maisto, Pierluigi Vitale, and
          <string-name>
            <given-names>Simonetta</given-names>
            <surname>Vietri</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Mining Offensive Language on Social Media</article-title>
          .
          <source>In Proceedings of the Fourth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Poletto</surname>
          </string-name>
          , Marco Stranisci, Manuela Sanguinetti, Viviana Patti, and
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Hate Speech Annotation: Analysis of an Italian Twitter Corpus</article-title>
          .
          <source>In Proceedings of the Fourth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2017</year>
          ). CEUR.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Polignano</surname>
          </string-name>
          and
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>HanSEL: Italian Hate Speech Detection through Ensemble Learning and Deep Neural Networks</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , Fabio Poletto, Cristina Bosco, Viviana Patti, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Stranisci</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>An Italian Twitter Corpus of Hate Speech against Immigrants</article-title>
          .
          <source>In Proceedings of the 11th Language Resources and Evaluation Conference</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Valentino</given-names>
            <surname>Santucci</surname>
          </string-name>
          , Stefania Spina, Alfredo Milani, Giulio Biondi, and Gabriele Di Bari.
          <year>2018</year>
          .
          <article-title>Detecting Hate Speech for Italian Language in Social Media</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Anna</given-names>
            <surname>Schmidt</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wiegand</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A Survey on Hate Speech Detection using Natural Language Processing</article-title>
          .
          <source>In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <article-title>Dirk von Gru¨nigen, Ralf Grubenmann</article-title>
          , Fernando Benites,
          <source>Pius Von Da¨niken, and Mark Cieliebak</source>
          .
          <year>2018</year>
          . spMMMP at GermEval 2018 Shared Task:
          <article-title>Classification of Offensive Content in Tweets using Convolutional Neural Networks and Gated Recurrent Units</article-title>
          .
          <source>In Proceedings of GermEval 2018, 14th Conference on Natural Language Processing (KONVENS</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <given-names>Zeerak</given-names>
            <surname>Waseem</surname>
          </string-name>
          , Wendy Hui Kyong Chung, Dirk Hovy, and Joel Tetreault, editors.
          <source>2017. Proceedings of the First Workshop on Abusive Language Online</source>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wiegand</surname>
          </string-name>
          , Melanie Siegel, and
          <string-name>
            <given-names>Josef</given-names>
            <surname>Ruppenhofer</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language</article-title>
          .
          <source>In Proceedings of GermEval 2018, 14th Conference on Natural Language Processing (KONVENS</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>