<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AMI @ EVALITA2020: Automatic Misogyny Identification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elisabetta Fersini</string-name>
          <email>elisabetta.fersini@unimib.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Debora Nozza</string-name>
          <email>debora.nozza@unibocconi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <email>prosso@dsic.upv.es</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bocconi University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>DISCo, University of Milano-Bicocca</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>PRHLT Research Center, Universitat Polite`cnica de Vale`ncia</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>English. Automatic Misogyny Identification (AMI) is a shared task proposed at the Evalita 2020 evaluation campaign. The AMI challenge, based on Italian tweets, is organized into two subtasks: (1) Subtask A about misogyny and aggressiveness identification and (2) Subtask B about the fairness of the model. At the end of the evaluation phase, we received a total of 20 runs for Subtask A and 11 runs for Subtask B, submitted by 8 teams. In this paper, we present an overview of the AMI shared task, the datasets, the evaluation methodology, the results obtained by the participants and a discussion about the methodology adopted by the teams. Finally, we draw some conclusions and discuss future work.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. Automatic Misogyny
Identification (AMI) e´ uno shared task proposto
nella campagna di valutazione Evalita
2020. La challenge AMI, basata su
tweet italiani, si distingue in due
subtasks: (1) subtask A che ha come
obiettivo l’identificazione di testi misogini e
aggressivi (2) subtask B relativo alla
fairness del modello. Al termine della fase
di valutazione, sono state ricevute un
totale di 20 submissions per il subtask A e
11 per il subtask B, inviate da un totale
di 8 team. Presentiamo di seguito una
sintesi dello shared task AMI, i dataset,
la metodologia di valutazione, i risultati
ottenuti dai partecipanti e una
discussione sulle metodologie adottate dai
diversi team. Infine, vengono discusse le
conclusioni e delineati gli sviluppi futuri.</p>
      <p>Copyright © 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>
        The expressions of people about thoughts,
emotions, and feelings by means of posts in social
media have been widely spread. Women have
a strong presence in these online environments:
75% of females use social media multiple times
per day compared to 64% of males. While new
opportunities emerged for women to express
themselves, systematic inequality and discrimination
take place in the form of offensive content against
the female gender. These manifestations of
misogyny, usually provided by a man to a woman for
dominating or using a sort of power against the
female gender, is a relevant social problem that
has been addressed in the scientific literature
during the last few years. Recent investigations
studied how the misogyny phenomenon takes place,
for example as unjustified slurring or as
stereotyping of the role/body of a woman (i.e., the
hashtag #getbacktokitchen), as described in the book
by Poland
        <xref ref-type="bibr" rid="ref25">(Poland, 2016)</xref>
        . Preliminary research
work was conducted in
        <xref ref-type="bibr" rid="ref17">(Hewitt et al., 2016)</xref>
        as the
first attempt of manual classification of
misogynous tweets, while automatic misogyny
identification in social media has been firstly investigated in
        <xref ref-type="bibr" rid="ref15 ref2">(Anzovino et al., 2018)</xref>
        . Since 2018, several
initiatives have been dedicated as a call-to-action to stop
hate against women both from a machine
learning and computational linguistics points of view,
such as AMI@Evalita 2018
        <xref ref-type="bibr" rid="ref14 ref15 ref2">(Fersini et al., 2018a)</xref>
        ,
AMI@IberEval2018
        <xref ref-type="bibr" rid="ref14 ref15 ref2">(Fersini et al., 2018b)</xref>
        and
HatEval@SemEval2019
        <xref ref-type="bibr" rid="ref5">(Basile et al., 2019)</xref>
        .
Several relevant research directions have been
investigated for addressing the misogyny
identification challenge, among which approaches focused
on effective text representation
        <xref ref-type="bibr" rid="ref14 ref15 ref2 ref23 ref4">(Bakarov, 2018;
Basile and Rubagotti, 2018)</xref>
        , machine learning
models
        <xref ref-type="bibr" rid="ref1 ref8">(Buscaldi, 2018; Ahluwalia et al., 2018)</xref>
        and domain-specific lexical resources
        <xref ref-type="bibr" rid="ref16 ref23">(Pamungkas
et al., 2018; Frenda et al., 2018)</xref>
        .
      </p>
      <p>
        During the AMI shared task organized at the
Evalita 2020 evaluation campaign
        <xref ref-type="bibr" rid="ref6">(Basile et al.,
2020)</xref>
        , the focus is not only on misogyny
identification but also on aggressiveness recognition, as
well as to the definition of models able to
guarantee fair predictions.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Task Description</title>
      <p>The AMI shared task, which is a re-run of a
previous challenge at Evalita 2018, proposes the
automatic identification of misogynous content in the
Italian language on Twitter. More specifically, it is
organized according to two main subtasks:
• Subtask A - Misogyny &amp; Aggressive
Behaviour Identification: a system must
recognize if a text is misogynous or not, and in
case of misogyny, if it expresses an
aggressive attitude. In order to provide an annotated
corpus for Subtask A, the following
definitions have been adopted to label the collected
dataset:
– Misogynous: a text that expresses
hating towards women in particular (in the
form of insulting, sexual harassment,
threats of violence, stereotype,
objectification, and negation of male
responsibility).
– Not Misogynous: a text that does not
express any form of hate towards women.
– Aggressive: a message is considered
aggressive if it (implicitly or explicitly)
presents, incites, threatens, implies,
suggests, or alludes to:
* attitudes, violent actions, hostility
or commission of offenses against
women;
* social isolation towards women for
physical or psychological
characteristics;
* justify or legitimize an aggressive</p>
      <p>
        action against women.
– Not Aggressive: If none of the previous
conditions hold.
• Subtask B - Unbiased Misogyny
Identification: a system must discriminate
misogynistic contents from the non-misogynistic ones,
while guaranteeing the fairness of the model
(in terms of unintended bias) on a synthetic
dataset
        <xref ref-type="bibr" rid="ref21 ref5">(Nozza et al., 2019)</xref>
        . To this purpose
Subtask B has the goal of measuring the
attitude of a model to be fair when processing
sentences containing specific identity terms
that likely conveyed misogyny in the training
data, e.g. “girlfriend” and “wife”.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Training and Testing Data</title>
      <p>The data provided to the participants for the AMI
shared task comprises a raw dataset and a synthetic
dataset for measuring bias. Each dataset is
distinguished in Training Set and Test Set.
3.1</p>
      <sec id="sec-4-1">
        <title>Raw dataset</title>
        <p>The raw dataset is a balanced dataset of
misogynous and non-misogynous tweets. The raw
training set (6,000 tweets) is derived from the data
collected for the 2018 edition of the AMI shared
task, where the misogynistic posts have been
enriched by labelling aggressive expressions
according to the definition provided in Section 2. The
raw test dataset (approximately 1,000 tweets) has
been collected from Twitter using a similar
approach to the 2018 edition of the shared task. This
is intentionally done to evaluate the generalization
abilities of the systems on test data collected in a
different time period and therefore characterized
by higher language variability with respect to the
training data. Examples of tweets belonging to the
raw dataset are shown in Table 1.</p>
        <p>The training raw data for this dataset are
provided as TSV files (tab-separated files) and report
the following fields, where:
• id denotes a unique identifier of the tweet.
• text represents the tweet text.
• misogynous defines whether a tweet is
misogynous or not misogynous; it takes
values:
– 0 if the tweet is not misogynous;
– 1 if the tweet is misogynous.
• aggressiveness denotes whether a
misogynous tweet is aggressive or not; it takes value
as:
– 0 denotes a non-aggressive tweet (not
misogynous tweets are labelled as 0 by
default);
– 1 if the tweet is aggressive.</p>
        <p>The raw testing data are provided as TSV files
reporting only id and text.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2 Synthetic dataset</title>
        <p>
          The synthetic test dataset for measuring the
presence of unintended bias has been created
following the procedure adopted in
          <xref ref-type="bibr" rid="ref11 ref21 ref5">(Dixon et al.,
2018; Nozza et al., 2019)</xref>
          : a list of identity terms
has been constructed by taking into consideration
some concepts related to the term “donna” (e.g.
“moglie”, “fidanzata”). Given the identity terms,
several templates have been created including
positive/negative verbs and adjectives (e.g.
negative: hate, inferior; positive: love, awesome) both
for conveying a misogynistic message or a
nonmisogynistic one. Some examples of such
templates, used to create the synthetic dataset, are
reported in Table 2.
        </p>
        <p>The synthetic dataset, created for measuring the
presence of unintended bias, contains
templategenerated text labelled according to:
• Misogyny: Misogyny (1) vs. Not Misogyny
(0)</p>
        <p>The training data for the raw dataset are
provided as TSV files (tab-separated files) and report
the following fields:
• id denotes a unique identifier of the
templategenerated text.
• text represents the template-generated text.
• misogynous defines if the template-generated
text is misogynous or non-misogynous; it
takes values as 1 if the tweet is misogynous,
0 if the tweet is non-misogynous.</p>
        <p>The synthetic testing data are provided as TSV
files (tab-separated files) reporting only id and
text.</p>
        <p>The statistics about the raw and synthetic
datasets, both for the training and testing sets, are
reported in Table 3.
Considering the distribution of labels of the
dataset, we have chosen different evaluation
metrics. In particular, we distinguished as follows:
Subtask A. Each class to be predicted (i.e.
“Misogyny” and “Aggressiveness”) has been
evaluated independently on the other using a
Macro F1-score. The final ranking of the systems
participating in Subtask A was based on the
Average Macro F1-score (F1), computed as
follows:
ScoreA =</p>
        <p>F1(M isogyny) + F1(Aggressiveness)
2</p>
        <p>Subtask B. The ranking for Subtask B is
computed by the weighted combination of AUC
estimated on the test raw dataset AU Craw and three
per-term AUC-based bias scores computed on
the synthetic dataset (AU CSubgroup, AU CBP SN ,
AU CBNSP ). Let s be an identity-term (e.g.
“girlfriend” and “wife”) and N be the total number of
identity-terms, the score of each run is estimated
according to the following metric:</p>
        <p>ScoreB
+= 2211NAhUPCraw+</p>
        <p>s AU Csubgroup(s)
+ Ps AU CBP SN (s)
+ Ps AU CBNSP (s)
i
(1)
(2)</p>
        <p>Unintended bias can be uncovered by looking at
differences in the score distributions between data
mentioning a specific identity-term (subgroup
distribution) and the rest (background distribution).
The three per-term AUC-based bias scores are
related to specific subgroups as follows:
• AU CSubgroup(s): calculates AUC only on
the data within the subgroup related to a
given identity term. This represents model
understanding and separability within the
subgroup itself. A low value in this
metric means the model does a poor job of
distinguishing between misogynous and
nonmisogynous comments that mention the
identity.
• AU CBP SN (s): Background Positive
Subgroup Negative (BPSN) calculates AUC on
the misogynous examples from the
background and the non-misogynous examples
from the subgroup. A low value in
this metric means that the model confuses
non-misogynous examples that mention the
identity-term with misogynous examples that
do not, likely meaning that the model predicts
higher misogynous scores than it should for
non-misogynous examples mentioning the
identity-term.
• AU CBNSP (s): Background Negative
Subgroup Positive (BNSP) calculates AUC on
the non-misogynous examples from the
background and the misogynous examples from
the subgroup. A low value here means
that the model confuses misogynous
examples that mention the identity with
nonmisogynous examples that do not, likely
meaning that the model predicts lower
misogynous scores than it should for misogynous
examples mentioning the identity.</p>
        <p>In order to compare the submitted runs with a
baseline model, we provided a benchmark
(AMIBASELINE) based on Support Vector Machine
trained on a unigram representation of tweets with
Tf-IDF weighing schema. In particular, we
created one training set for each field to be predicted,
i.e. “misogynous”, “aggressiveness”, where each
tweet has been represented as a bag-of-words
(composed of 1000 terms) coupled with the
corresponding label. Once the representations have
been obtained, Support Vector Machines with
linear kernel have been trained and provided as
AMIBASELINE.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Participants and Results</title>
      <p>
        A total of 8 teams from 6 different countries
participated in at least one of the two subtasks of
AMI. Two teams participated with the same
approach also in the HaSpeeDe shared task
        <xref ref-type="bibr" rid="ref28">(Sanguinetti et al., 2020)</xref>
        , addressing misogyny
identification with generic models for detecting hate
speech. Each team had the chance to submit up
to three runs that could be constrained (c), where
only the provided training data and lexicons were
admitted, and unconstrained (u), where additional
data for training were allowed. Table 4 reports
an overview of the teams illustrating their
affiliation, their country, the number and type (c for
constrained, u for unconstrained) of submissions, and
the subtasks they addressed.
      </p>
      <sec id="sec-5-1">
        <title>5.1 Subtask A: Misogyny &amp; Aggressive</title>
      </sec>
      <sec id="sec-5-2">
        <title>Behaviour Identification</title>
        <p>Table 5 reports the results for the Misogyny &amp;
Aggressive Behaviour Identification task, which
received 20 submissions submitted by 8 teams.
The highest result has been achieved by jigsaw
at 0.7406 in an unconstrained setting and by
fabsam at 0.7342 in a constrained run. While the best
results obtained as unconstrained is based on
ensembles of fine-tuned custom BERT models, the
one achieved by the best constrained system is
grounded on a convolutional neural network that
exploits pre-trained word embeddings.</p>
        <p>By analysing the detailed results, it emerged
that while the identification of misogynous text
can be considered a quite simple problem, the
recognition of aggressiveness needs to be properly
addressed. In fact, the score reported in Table 5
are strongly affected by the prediction
capabilities mostly related to the aggressive posts. This
is likely due to the subjective perception of
aggressiveness captured by the variance of the data
available in the ground truth.</p>
        <p>After the deadline the team UniBO submitted an
amended run (**), that has not been ranked in the
official results of the AMI shared task. However,
we believe interesting to mention their
achievement showing an Average Macro F1-score equal
to 0.744.
5.2</p>
      </sec>
      <sec id="sec-5-3">
        <title>Subtask B: Unbiased Misogyny</title>
      </sec>
      <sec id="sec-5-4">
        <title>Identification</title>
        <p>Table 6 reports the results for the Unbiased
Misogyny Identification task, which received 11
submissions by 4 teams, among which 4 unconstrained
and 7 constrained. The highest Average Macro F1
score has been achieved by jigsaw at 0.8825 with
an unconstrained run and by PoliTeam at 0.8180
with a constrained submission.</p>
        <p>Similarly to the previous task, most of the
systems have shown better performance compared to
the AMI-BASELINE. By analizing the runs, we can
highlight that the two best results achieved on
Subtask B have been obtained by the unconstrained
run submitted by jigsaw, where a simple debiasing
technique based on data augumentation have been
adopted, and by the constrained run provided by
Politeam, where the problem of biased prediction
has been partially mitigated by introducing
misogynous lexicon.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>The submitted systems can be compared by
taking into consideration the kind of input feature that
they have considered for representing tweets and
the machine learning model that has been used as
classification model.</p>
      <p>
        Textual Feature Representation. The systems
submitted by the challenge participants’ consider
various techniques for representing the tweet
contents. Most of the teams experimented a high-level
representation of the text based deep learning
solutions. While few teams like fabsam and MDD
adopted a text representation based on traditional
word embeddings such as Word2Vec
        <xref ref-type="bibr" rid="ref19">(Mikolov et
al., 2013)</xref>
        , Glove
        <xref ref-type="bibr" rid="ref24">(Pennington et al., 2014)</xref>
        and
FastText
        <xref ref-type="bibr" rid="ref7">(Bojanowski et al., 2017)</xref>
        , most of the
systems. i.e NoPlaceForHateSpeech,jigsaw,
PoliTeam, YNU OXZ and UniBO, exploited richer
sentence embeddings such as BERT
        <xref ref-type="bibr" rid="ref10">(Devlin et
al., 2019)</xref>
        or XLM-RoBert
        <xref ref-type="bibr" rid="ref26">(Ruder et al., 2019)</xref>
        .
For enriching the space for then training the
subsequent models to recognize misogyny and
aggressiveness, PoliTeam experimented the use of
additional lexical resources such as misogynous
lexicon and sentiment Lexicon.
      </p>
      <sec id="sec-6-1">
        <title>Machine Learning Models. Concerning the</title>
        <p>
          machine learning models, we can distinguish
between approaches trained from scratch and those
ones based on fine-tuning of existing pre-trained
models. We report in the following the strategy
adopted by the systems that participated in the
AMI shared task, according to the type of machine
learning model that has been adopted:
• Shallow models have been experimented by
MDD, where logistic regressions have been
trained according to different hand-crafted
features;
• Convolutional Neural Networks have been
exploited by NoPlaceForHateSpeech by
using two distinct models for misogyny
detection and aggressiveness identification, by
fabsal investigating the optimal hyperparameters
of the model, and by YNU OXZ where on top
of the CNN architecture a Capsule Network
          <xref ref-type="bibr" rid="ref27">(Sabour et al., 2017)</xref>
          has been introduced for
taking advantage of spatial patterns available
in short texts;
• Fine-Tuning of pre-trained models has
been exploited by jigsaw by adapting BERT
to the challenge domain and using a
transfer multilingual strategy and ensemble
learning, by UniBO that accommodated the BERT
model using a multi-label output neuron, and
by PoliTeam where the prediction of the
finetuned sentence-BERT is coupled with
prediction based on lexicons.
        </p>
        <p>For what concerns the achieved results on the
two subtasks, few considerations can be drawn
considering both the errors done by the systems
and the mitigation strategies adopted for reducing
the bias.</p>
        <p>Error Analysis When testing the developed
systems on raw test data, the majority of the
performed errors can be summarized by the following
patterns:
• Under-representation of subjective
expressions: those posts written by introducing
erroneous lower case and missing spaces
between adjoining words lead the models based
on raw text to make errors on test predictions.
An example of such common errors is the one
reported in the following tweet:
“Odio Sakura per il semplice
motivo che qualunque cosa faccia
o dica Naruto lei lo prende a
schiaffi o a pugniHA CHIESTO
COME STA SAI DIOSANTO
BRUTTA STRONZA MA
CONTRALLI MADONNA SPERO CHE</p>
        <p>TI UCCIDANOscusami Sarada”
• Undefined subject, but presence of
aggressive terms: for those tweets where the
target is not clearly mentioned, but several
aggressive terms are present, the models tend to
be biased and to predict the post as
misogynous and aggressive erroneously. An
example of this type of misclassified posts is
reported here:
“Vero...ma c’e chi ti cerca, che
ti vuole, piu` di ogni cosa al
mondo......ma non sa se viene
capito..... potrebbe esser mal
interpretato e di conseguenza
all’abbraccio esser denunciato per
molestie sessuali e/o stupro”
• Short hate speech sentences referred to
others than women: when the target is
mentioned by using an actual account, but it is
referred to men, and there are no additional
indications about the gender of the target, most
of the models tend to misclassify the tweet.
In the following example, the target is a male
football player:
“@bonucci leo19 Cagati in mano
e prenditi a schiaffi. Sti post te li
infili nel c*lo!”
Concerning the errors on the synthetic test set
used for estimating the bias of the models,
two main errors carried out by the majority
of the systems can be identified:
• presence of unusual target: in most of the
submissions, sentences containing offensive
expressions towards specific uncommon
targets are misclassified. For instance, around
39% of the predictions related to the target
nonna (i.e., grandmother) are wrong. An
example of the most misclassified target is
reported in the following example:</p>
        <p>“nonne belle”
• Presence of unusual verbs: analogously to
what has been observed for the target,
sentences containing rare aggressive verbs tend
to be misclassified. For instance, around
48% of the instances related to the verbs
malmenare and seviziare (i.e., beat up and
torture) are wrongly classified. An example of a
mistaken sentence are reported here:
“femmina dovrebbe essere
seviziata” (wrongly classified as
non-misogynous)</p>
      </sec>
      <sec id="sec-6-2">
        <title>Bias Mitigation strategies. Concerning the</title>
        <p>Subtask B, only one team (jigsaw) addressed
explicitly the problem related to the unintended bias.
The authors used sentences sampled from the
Italian Wikipedia articles containing some of the
identity terms provided with the test set. These
sentences, labeled as both non-misogynous and
non-aggressive, have been used to further
finetune the model and reduce the bias given by the
data. The results achieved by the jigsaw team
highlight that a debiasing method could obtain fair
predictions even using pre-trained models.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusions and Future Work</title>
      <p>This paper presents the AMI shared task, focused
not only on identifying misogynous and
aggressive expressions but also on ensuring fair
predictions. By analysing the runs submitted by the
participants, we can conclude that while the
problem of misogyny identification has reached
satisfactory results, the recognition of aggressiveness
is still in its infancy. Concerning the
capabilities of the systems with respect to the unintended
bias problem, we can highlight that a
domaindependent mitigation strategy is a necessary step
towards fair models.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The work of the last author was partially funded by
the Spanish MICINN under the research project
MISMISFAKEnHATE on MISinformation and
MIScommunication in social media: FAKE news
and HATE speech (PGC2018-096212-B-C31) and
by the COST Action 17124 DigForAsp supported
by the European Cooperation in Science and
Technology.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Resham</given-names>
            <surname>Ahluwalia</surname>
          </string-name>
          , Himani Soni, Edward Callow,
          <string-name>
            <surname>Anderson Nascimento</surname>
          </string-name>
          , and Martine De Cock.
          <year>2018</year>
          .
          <article-title>Detecting Hate Speech Against Women in English Tweets</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Maria</given-names>
            <surname>Anzovino</surname>
          </string-name>
          , Elisabetta Fersini, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Automatic Identification and Classification of Misogynistic Language on Twitter</article-title>
          .
          <source>In Proceedings of 23rd International Conference on Applications of Natural Language to Information Systems (NLDB)</source>
          , pages
          <fpage>57</fpage>
          -
          <lpage>64</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Attanasio</surname>
          </string-name>
          and
          <string-name>
            <given-names>Eliana</given-names>
            <surname>Pastor</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>PoliTeam @ AMI: Improving Sentence Embedding Similarity with Misogyny Lexicons for Automatic Misogyny Identification in Italian Tweets</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Angelo</given-names>
            <surname>Basile</surname>
          </string-name>
          and
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Rubagotti</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Automatic Identification of Misogyny in English and Italian Tweets at EVALITA 2018 with a Multilingual Hate Lexicon</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Cristina Bosco, Elisabetta Fersini, Nozza Debora, Viviana Patti, Francisco Manuel Rangel Pardo, Paolo Rosso,
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , et al.
          <year>2019</year>
          .
          <article-title>Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter</article-title>
          .
          <source>In Proceedings of 13th International Workshop on Semantic Evaluation</source>
          , pages
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>EVALITA 2020: Overview of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , Edouard Grave, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>5</volume>
          :
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Davide</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Tweetaneuse AMI EVALITA2018: Character-based Models for the Automatic Misogyny Identification Task</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Adriano dos S. R. da Silva</surname>
          </string-name>
          and Norton T. Roman.
          <year>2020</year>
          .
          <article-title>No Place For Hate Speech @ AMI: Convolutional Neural Network and Word Embedding for the Identification of Misogyny in Italian</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the</source>
          <year>2019</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)</article-title>
          , pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Lucas</given-names>
            <surname>Dixon</surname>
          </string-name>
          , John Li, Jeffrey Sorensen, Nithum Thain, and
          <string-name>
            <given-names>Lucy</given-names>
            <surname>Vasserman</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Measuring and mitigating unintended bias in text classification</article-title>
          .
          <source>In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society</source>
          , pages
          <fpage>67</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Samer</given-names>
            <surname>El Abassi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sergiu</given-names>
            <surname>Nisioi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>MDD@AMI: Vanilla Classifiers for Misogyny Identification</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Samuel</given-names>
            <surname>Fabrizi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>fabsam @ AMI: a Convolutional Neural Network approach</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Debora Nozza, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          . 2018a.
          <article-title>Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI)</article-title>
          .
          <source>In Tommaso Caselli</source>
          , Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the Sixth evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA</source>
          <year>2018</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Paolo Rosso, and
          <string-name>
            <given-names>Maria</given-names>
            <surname>Anzovino</surname>
          </string-name>
          . 2018b.
          <article-title>Overview of the Task on Automatic Misogyny Identification at IberEval 2018</article-title>
          . In IberEval@ SEPLN, pages
          <fpage>214</fpage>
          -
          <lpage>228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Simona</given-names>
            <surname>Frenda</surname>
          </string-name>
          , Bilal Ghanem,
          <article-title>Estefan´ıa Guzma´nFalco´n, Manuel Montes-y-Go´mez, and Luis Villasen˜or-</article-title>
          <string-name>
            <surname>Pineda</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Automatic Lexicons Expansion for Multilingual Misogyny Detection</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Sarah</given-names>
            <surname>Hewitt</surname>
          </string-name>
          , Thanassis Tiropanis, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bokhove</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The Problem of identifying Misogynist Language on Twitter (and other online social spaces)</article-title>
          .
          <source>In Proceedings of the 8th ACM Conference on Web Science</source>
          , pages
          <fpage>333</fpage>
          -
          <lpage>335</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Alyssa</given-names>
            <surname>Lees</surname>
          </string-name>
          , Jeffrey Sorensen, and
          <string-name>
            <given-names>Ian</given-names>
            <surname>Kivlichan</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Jigsaw @ AMI and HaSpeeDe2: Fine-Tuning a Pre-Trained Comment-Domain BERT Model</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Jeff</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Arianna</given-names>
            <surname>Muti</surname>
          </string-name>
          and Alberto Barro`
          <fpage>n</fpage>
          -Ceden˜o.
          <year>2020</year>
          .
          <article-title>UniBO@AMI: A Multi-Class Approach to Misogyny and Aggressiveness Identification on Twitter Posts Using AlBERTo</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Debora</given-names>
            <surname>Nozza</surname>
          </string-name>
          , Claudia Volpetti, and
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Unintended bias in misogyny detection</article-title>
          .
          <source>In IEEE/WIC/ACM International Conference on Web Intelligence</source>
          , pages
          <fpage>149</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Xiaozhi</given-names>
            <surname>Ou</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hongling</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>YNU OXZ @ HaSpeeDe 2 and AMI : XLM-RoBERTa with Ordered Neurons LSTM for classification task at EVALITA 2020</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Endang</given-names>
            <surname>Wahyu</surname>
          </string-name>
          <string-name>
            <surname>Pamungkas</surname>
          </string-name>
          , Alessandra Teresa Cignarella, Valerio Basile, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Automatic Identification of Misogyny in English and Italian Tweets at EVALITA 2018 with a Multilingual Hate Lexicon</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Jeffrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Bailey</given-names>
            <surname>Poland</surname>
          </string-name>
          .
          <year>2016</year>
          . Haters: Harassment, Abuse, and
          <string-name>
            <given-names>Violence</given-names>
            <surname>Online</surname>
          </string-name>
          . Potomac Books, Incorporated.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Ruder</surname>
          </string-name>
          , Anders Søgaard, and Ivan Vulic´.
          <year>2019</year>
          .
          <article-title>Unsupervised cross-lingual representation learning</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts</source>
          , pages
          <fpage>31</fpage>
          -
          <lpage>38</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Sara</given-names>
            <surname>Sabour</surname>
          </string-name>
          , Nicholas Frosst, and
          <string-name>
            <given-names>Geoffrey E</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Dynamic routing between capsules</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3856</fpage>
          -
          <lpage>3866</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , Gloria Comandini, Elisa Di Nuovo, Simona Frenda, Marco Stranisci, Cristina Bosco, Tommaso Caselli, Viviana Patti, and
          <string-name>
            <given-names>Irene</given-names>
            <surname>Russo</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>HaSpeeDe 2@EVALITA2020: Overview of the EVALITA 2020 Hate Speech Detection Task</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>