<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UniBO @ AMI: A Multi-Class Approach to Misogyny and Aggressiveness Identification on Twitter Posts Using AlBERTo</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arianna Muti</string-name>
          <email>arianna.muti@studio.unibo.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Barro´ n-Ceden˜ o</string-name>
          <email>a.barron@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DIT - Universita` di Bologna</institution>
          ,
          <addr-line>Forl`ı</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Modern Languages, Literatures and Cultures - LILEC, Universita` di Bologna</institution>
          ,
          <addr-line>Bologna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe our participation in the EVALITA 2020 (Basile et al., 2020) shared task on Automatic Misogyny Identification. We focus on task A -Misogyny and Aggressive Behaviour Identification- which aims at detecting whether a tweet in Italian is misogynous and, if so, whether it is aggressive. Rather than building two different models, one for misogyny and one for aggressiveness identification, we handle the problem as one single multi-label classification task, considering three classes: nonmisogynous, non-aggressive misogynous, and aggressive misogynous. Our threeclass supervised model, built on top of AlBERTo, obtains an overall F1 score of 0:7438 on the task test set (F1 = 0:8102 for the misogyny and F1 = 0:6774 for the aggressiveness task), which outperforms the top submitted model (F1 = 0:7406).1</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>In 2020, Twitter users in Italy amount to
approximately 3.7 million and the number is
expected to constantly increase by 2026.2 Although
Twitter is conceived to express personal
opinions, share today’s biggest news, follow people
or simply communicate with friends, there has</p>
      <p>Copyright ©2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).</p>
      <p>1Our official submission to the task obtained F1 = 0:6343
(F1 = 0:7263 for the misogyny and F1 = 0:5423 for the
aggressiveness task). The reason behind this poor performance
was the unintended use of a mistaken transformer. See
Appendix A for further details.</p>
      <p>
        2https://www.statista.com/forecasts/
1146708/twitter-users-in-italy; last visit: 6
November, 2020.
been an increasing number of users that misuse
the platform by engaging in trolling,
cyberbullying, or by posting aggressive and misogynous
content
        <xref ref-type="bibr" rid="ref13">(Samghabadi et al., 2020)</xref>
        . Due to the sheer
amount of user-generated content on social media,
providers struggle to control inappropriate
content. Twitter relies on the community’s reports to
identify and remove abusive posts from the
platform, while pursuing the users’ right to freedom
of expression. However, it is a tricky task to
determine where to draw the line between free
expression and the production of harmful content,
due to the subjective nature of what different users
perceive as offensive. Twitter has committed to
tackling this issue by releasing a policy containing
a clear definition of abusive speech, according to
which a user cannot promote violence against or
directly attack or threaten people on the basis of
race, ethnicity, national origin, caste, sexual
orientation, gender, gender identity, religious
affiliation, age, disability, or serious disease.3
      </p>
      <p>
        However, two main issues exist. Since
Twitter mostly relies on the community subjective
perception of hate speech, many posts are not
subjected to report, review, and removal. Moreover,
the amount of abusive posts significantly
outnumbers the people that can manually control harmful
content. Therefore, there is a need to improve the
quality of algorithms to spot potential instances of
hate speech; in particular towards women, since
research shows that they are subjected to more
bullying, abuse, hateful language, and threats than
men on social media
        <xref ref-type="bibr" rid="ref5">(Fallows, 2005)</xref>
        .
      </p>
      <p>
        AMI 2020 consists of two tasks
        <xref ref-type="bibr" rid="ref8">(Fersini et al.,
2020)</xref>
        . Task A —Misogyny and Aggressive
Behaviour Identification— aims at detecting whether
a Twitter post is misogynous and, if so, whether it
is aggressive
        <xref ref-type="bibr" rid="ref1 ref7">(Anzovino et al., 2018)</xref>
        . Task B —
      </p>
      <sec id="sec-1-1">
        <title>3https://help.twitter.</title>
        <p>
          com/en/rules-and-policies/
hateful-conduct-policy
Unbiased Misogyny Identification— aims at
discriminating misogynistic contents from the
nonmisogynist ones, while guaranteeing the fairness
of the model (in terms of unintended bias) on
a synthetic dataset
          <xref ref-type="bibr" rid="ref11">(Nozza et al., 2019)</xref>
          . We
undertook task A and we present a system to
flag misogynous and women-addressed aggressive
posts on Twitter in the Italian language. Even
if task A involves two sub-problems, we address
it as a three-class supervised problem using
AlBERTo
          <xref ref-type="bibr" rid="ref12">(Polignano et al., 2019)</xref>
          , a BERT
language understanding model for the Italian
language which is focused on the language used
in social networks, specifically on Twitter. We
built only one model to identify the three possible
classes: non-misogynous, non-aggressive
misogynous, and aggressive misogynous. This
multiclass setting has shown to be effective. Our
approach obtains an F1 score of 0:7438,
outperforming the top-ranked official submission (although
our own official submission obtained F1 = 0:6343
only; cf. Appendix A).
        </p>
        <p>The rest of the contribution is distributed as
follows. Section 2 includes some background and a
brief overview of research in automatic misogyny
identification. Section 3 describes the employed
dataset. Section 4 describes our model. Section 5
summarizes the experiments performed and
discusses the obtained results. It includes an error
analysis, in order to show the error trends of the
model. Section 6 draws some conclusions and
discusses further possible research lines.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Due to the subjective perception of misogyny and
aggressiveness, a definition of what can be
considered misogynous and aggressive is necessary:
Misogynous content expresses hating towards
women, in the form of insulting, sexual
harassment, male privilege, patriarchy, gender
discrimination, belittling of women, violence against
women, body shaming and sexual
objectification
        <xref ref-type="bibr" rid="ref14">(Srivastava et al., 2017)</xref>
        . A misogynous
content expresses an aggressive attitude when it
overtly or covertly encourages or legitimizes
violent actions against women.
      </p>
      <p>
        From a computational point of view,
misogyny detection is a text classification task. Text
classification in Natural Language Processing has
been widely explored and it is typically addressed
by using supervised models
        <xref ref-type="bibr" rid="ref1 ref10 ref2 ref6 ref7">(Miron´czuk and
Protasiewicz, 2018)</xref>
        . Past research shows the
effectiveness of diverse neural-network architectures to
learn text representations, such as convolutional
models, recurrent networks and attention
mechanisms
        <xref ref-type="bibr" rid="ref15">(Sun et al., 2019)</xref>
        . Recent work shows that
pre-trained models such as BERT achieve
state-ofthe-art results in text classification tasks and spare
time, since they prevent you from training models
from scratch
        <xref ref-type="bibr" rid="ref15">(Sun et al., 2019)</xref>
        .
      </p>
      <p>
        For what concerns misogyny identification, a
shared task took place at IberEval 2018,
focusing on English and Spanish tweets
        <xref ref-type="bibr" rid="ref1 ref6 ref7">(Fersini et
al., 2018b)</xref>
        . Whereas task A concerned
misogyny identification, task B proposed a multi-class
problem to classify misogynous sentences into
seven categories: discredit, stereotype,
objectification, sexual harassment, threats of violence,
dominance, and derailing. The most used supervised
models were support vector machines, ensembles
of classifiers and deep-learning models.
Participants mostly used n-grams and word embeddings
to represent the tweets.
      </p>
      <p>
        As for misogyny identification in Italian, the
first edition of the AMI shared task took place
in 2018
        <xref ref-type="bibr" rid="ref1 ref7">(Anzovino et al., 2018)</xref>
        . The task A
was again misogyny identification, while the task
B aimed at recognizing whether a misogynous
content is person-specific or generally addressed
towards a group of women, and at classifying
the positive instances in the aforementioned
categories. The best-performing approach obtained an
F1 score of 0:844, using TF-IDF weighting
combined with singular value decomposition for
language representation and an ensemble of
supervised models
        <xref ref-type="bibr" rid="ref1 ref6 ref7">(Fersini et al., 2018a)</xref>
        .
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Dataset</title>
      <p>As mentioned above, the aim of our model is to
flag misogynous contents and aggressive attitudes
towards women in Italian tweets. To address this
task, a dataset was provided by the task
organizers: 5; 000 tweets, manually labelled according to
two classes, misogyny and aggressiveness. The
first one defines whether a tweet has been flagged
as misogynous (positive class) or not (negative
class). If a tweet has been flagged as misogynous,
it is further determined whether it is considered as
aggressive (positive class) or not (negative class).</p>
      <p>The training dataset is fairly balanced in terms
of misogyny. It contains 2; 337 misogynous and
2; 663 non-misogynous instances. A total of
1; 783 of the former are also considered as
aggressive, whereas only 554 are not. The test set was
composed of 1; 000 tweets.</p>
      <p>Since we opted for a constrained approach, we
only used the data provided by the organizers. We
randomly split the supervised data into training
and validation sets: 4; 700 instances for the former
and 300 for the latter.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Description of the System</title>
      <p>Since the identification of aggressiveness is related
to the identification of misogynous tweets, we opt
for a 3-class setting, based on one single model.
The three classes are hence non-misogynist,
aggressive misogynist, and non-aggressive
misogynist. The idea is to determine how well a
multilabel classifier can perform when addressing these
two related problems; handling aggressiveness as
a consequential class of the misogyny one.</p>
      <p>
        We decided to base our model on BERT
(Bidirectional Encoder Representations from
Transformers), a task-independent language
representation model based on the transformers
architecture
        <xref ref-type="bibr" rid="ref4">(Devlin et al., 2019)</xref>
        . BERT uses a masking
approach that randomly masks some input tokens
within a sentence and then predicts the removed
tokens based on the context. It is bidirectional
because it makes use of Transformers that consider
both the left and right context at once with
respect to the hidden word to make the prediction
upon. We decided to use AlBERTo, a variation of
BERT in Italian, trained on Twitter posts
        <xref ref-type="bibr" rid="ref12">(Polignano et al., 2019)</xref>
        , which includes emojis, links,
hashtags, and mentions. AlBERTo was trained on
200M tweets randomly sampled from the TWITA
corpus
        <xref ref-type="bibr" rid="ref2">(Basile et al., 2018)</xref>
        .
      </p>
      <p>As for the pre-processing, we used the
pretrained AlBERTo tokenizer for text tokenization,
and then we encoded the data. We set the
maximum length to 256 characters, since that was the
length of the longest instance in the training
material (even if Twitter allows up to 280 characters).
team run constrained score
UniBOa 2 yes 0.7438
jigsaw 2 no 0.7406
jigsaw 1 no 0.7380
fabsam 1 yes 0.7343
YNU OXZ 1 no 0.7314
fabsam 2 yes 0.7309
NoPlaceForHateSpeech 2 yes 0.7167
YNU OXZ 2 no 0.7015
fabsam 3 yes 0.6948
NoPlaceForHateSpeech 1 yes 0.6934
AMI the winner 2 yes 0.6869
MDD 3 no 0.6844
PoliTeam 3 yes 0.6835
MDD 1 yes 0.6820
PoliTeam 1 yes 0.6810
MDD 2 no 0.6679
AMI the winner 1 yes 0.6653
PoliTeam 2 yes 0.6473
UniBOb 1 yes 0.6343
AMI the winner 3 yes 0.6259
NoPlaceForHateSpeech 3 yes 0.4902
a Run submitted after the deadline.</p>
      <p>b Buggy run submitted on the deadline (cf. Appendix A).</p>
      <p>We used the Pytorch instance of AlBERTo-Base,
Italian Twitter lower cased4 and fine-tuned it to the
downstream task. We used a softmax output layer
with three neurons to produce the classification.</p>
      <p>
        In order to tune the network, we used the
AdamW optimizer, which decouples weight decay
from gradient computation, with a learning rate of
1e-5
        <xref ref-type="bibr" rid="ref9">(Loshchilov and Hutter, 2017)</xref>
        .5
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>We explored different batch sizes over an
increasing number of learning epochs. Table 1 shows the
performance evolution on the validation set. The
best combination was to train the model over 8
epochs with a batch size of 16. This combination
leads to an F1 score of 0:8491 on the three-class
problem. It is worth noting that these scores are
not comparable against those for the actual task,
which consists of two independent binary
decisions: whether a tweet is considered misogynist
and, if the answer is yes, whether it is aggressive.6</p>
      <sec id="sec-5-1">
        <title>4https://github.com/marcopoli/</title>
        <p>AlBERTo-it,</p>
        <p>5The implementation is available at https:
//github.com/TinfFoil/unibo_ami2020/.</p>
        <p>6Indeed, the official task score is computed as the average
of the F1 measures for the misogyny and the aggressiveness
classifications.
tweet
Type-1 errors
1.2 La cosa del filo del telefono attorno alla gola e` presa da una lite di</p>
        <p>Stefanie e Brooke in Beautiful #chilhavisto
1.2 “Quanti schiaffi, #Malinconico. Li meriteresti tutti, perche´ anche
questa volta sei riuscito a lasciare il segno.E no, non era facile. . . ”
1.3 Tutti quelli che ho conosciuto mi hanno sempre detto che a primo
impatto sembro stronza e psicopatica ma in verita` sono una cucciola
Type-2 errors
2.1 @Fedoc5 in ginocchio con te che comandi il ritmo del pompino e
decidi quando sborrarmi in gola, perche` un pompino deve sempre
finire con l’ingoio
2.2 @Nigagalsen @matteosalvinimi un follower dal 2016. . . e ti
permetti di aprire quella lurida fogna di bocca che hai.
2.3 Antonio ti prenderei a schiaffi, come fa a dire mi manchi, quando
con Ilaria fai tutto. Coglione
2.4 Posso volerlo vedere cagarsi in mano e prendersi a schiaffi finche´
non diventano dispari o sono nazista?
Type-3 errors
3.1 La cena e` pronta. Lo squalo balena affonda nei banchi di pesce per
saziare la sua immensa mole. Non e` un abitudine frequente visto
che filtra dall’acqua i microorganismi come le balene.
3.2 Comunque le pringles piu` buone sono quelle alla panna acida e
cipolla
misogynist aggressive
actual pred actual pred
no
no
yes
no
no
no
no
no
no
yes
yes
no
yes
yes
no
no
yes
yes
no
no
no
no
no
no
no
no
no
yes
yes
no
yes
no
no
no
no
no</p>
        <p>Given these results, we trained a new model
on the full trained and development sets during 8
epochs, using a batch size of 16, and predicted on
the test set. Such model obtains F1 = 0:7438,
resulting from 0:8102 on the misogyny task and
0:6774 from the aggressiveness one.</p>
        <p>Table 2 shows the AMI shared task leaderboard.</p>
        <p>It highlights both our official submission UniBO
run 1 (cf. Appendix A) and our post-deadline
submission UniBO run 2. Run 2 tops all the
systems submitted to the shared task. Indeed,
modelling the two tasks as one single multi-class
problem (and using transformers for the right
language) helps the algorithm significantly.</p>
        <p>Error Analysis After the release of the gold
labels, we performed an analysis of the
classification errors. We analyzed 300 instances, taken
randomly from the test set (100 at the beginning, 100
in the middle and 100 at the end). As observed
from the reported performance, our model
struggled mostly with the identification of aggressive
instances. As a result, there are relatively few
cases in which our model correctly labels
nonaggressive misogynous instances. We noticed that
most of the time, when our model labels an
instance as misogynist, it also labels it as
aggressive. On the contrary, the system performs very
well in identifying non-misogynous instances and
aggressive-misogynous instances. The most
common mistakes are grouped into three categories:
1. The system identifies as aggressive the
instances that contain verbs expressing an
aggressive attitude.7
2. The system identifies as misogynous (and</p>
        <p>most of the time also aggressive) instances
7One potential reason behind this confusion is that we
suspect that there are aggressive tweets in the dataset which, not
having been identified as misogynist in the first place, are
mislabeled as non-aggressive. This hypothesis should be
further explored.
that are neither misogynous nor aggressive,
but contain typical misogynous sentences.
3. The system identifies as misogynous
instances that are neither misogynous nor
aggressive, but they contain double-entendre
words typically used to insult women.</p>
        <p>Table 3 shows some examples for all three kinds
of errors. Regarding the errors of type 1, in
instance 1.1 the action of winding up a telephone
cable around the neck was perceived as
aggressive, despite the speaker did not express a
misogynous or aggressive attitude towards a woman,
and indeed she is just commenting on something
watched on TV. In instance 1.2, the sentence
meritare gli schiaffi (deserving slaps) denotes
violence, but it is not addressed towards a woman.</p>
        <p>This kind of mistake might be overcome by
implementing a model trained on the misogynist
partition of the data only. Finally, instance 1.3
represents the bias related to the subjectivity nature
of what is perceived to be misogynous.
According to the annotation guidelines, a tweet should
be flagged as misogynous if it expresses hating
towards women. In this case, the poster of the
tweet is not expressing any misogynous attitude,
but she is reporting what she is been told by males.</p>
        <p>Therefore, our system flagged the instance as
nonmisogynous and we could agree.</p>
        <p>As for the errors of type 2, if we look at the text
only, the instances could seem misogynous
sentences. However, in the instances 2.1 and 2.2 the
hashtag tells us that it is referred to a man and the
system fails to understand that. On the contrary,
the system performs well when a masculine name
or a masculine pronoun is specified, instead of an
hashtag, as we can observe in the instances 2.3
and 2.4. In these cases our system could
understand that the aggressive actions, that usually tend
to be classified as aggressive-misogynous, are not
referred to a woman.</p>
        <p>For the type 3 errors, in instance 3.1 balena
(whale/fat woman) and in 3.2 acida (acid/peevish)
could confuse the model causing it to flag such
instances as misogynous.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Further Work</title>
      <p>In this paper we described our approach to the
EVALITA 2020 task on misogyny and
aggressiveness identification in Italian tweets —AMI.
The purpose of our participation was to
determine whether a multi-label classifier is a good
way to address this two-step task. Although the
task seems to be conceived to be addressed with
two different models, one for the identification of
misogyny and the other for aggressiveness, we
decided to try a different approach and build a
single model that could identify three cases:
nonmisogynous, non-aggressive misogynous and
aggressive misogynous tweets.</p>
      <p>We built our model on top of AlBERTo, an
Italian version of BERT, and we trained the model
using only the dataset provided by the task
organizers. We experimented by setting different batch
sizes over an increasing number of epochs. The
highest F1 score on the validation set was reached
by a batch size of 16 during 8 epochs. When
evaluated on the test set, our model obtained an overall
F1 score of 0:7438; 0:8102 for the misogyny and
0:6744 for the aggressiveness task. We
hypothesize that the model struggles to identify misogynist
aggressive instances partly because it gets
confused by non-misogynist aggressive tweets which
are labeled simply as non-misogynous. The
implementation is publicly available for research
purposes.</p>
      <p>For what concerns further experiments, we plan
to build two separate models: one to detect
misogyny and the other trained only on already-flagged
misogynous tweets to identify instances of
aggressiveness. Another step to undertake would be to
use an unconstrained approach and increase the
number of instances for the training set, so that
the model will have more data to learn from.</p>
      <p>A</p>
    </sec>
    <sec id="sec-7">
      <title>Official English-BERT-based</title>
    </sec>
    <sec id="sec-8">
      <title>Submission</title>
      <p>Our official submission used a pre-trained BERT
model trained only on the English language. The
experimentation and tuning were identical to the
one applied when using AlBERTo (cf. Section 5).
Table 4 shows the tuning evolution. The best
configuration of this model, derived from the English
BERT, obtains an F1 score of 0:8222 on the
validation set when dealing with our three-class
problem. Nevertheless, the performance dropped to
F1 = 0:6343 on the test set.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Maria</given-names>
            <surname>Anzovino</surname>
          </string-name>
          , Elisabetta Fersini, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Automatic identification and classification of misogynistic language on twitter</article-title>
          .
          <source>In International Conference on Applications of Natural Language to Information Systems</source>
          , pages
          <fpage>57</fpage>
          -
          <lpage>64</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Mirko Lai, and
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Long-term social media data collection at the university of turin</article-title>
          .
          <source>In Fifth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2018</year>
          ), Turin, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, MN, June. ACL.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Deborah</given-names>
            <surname>Fallows</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>How women and men use the internet</article-title>
          .
          <source>Technical report</source>
          , Pew Internet &amp; American Life Project, December.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Debora Nozza, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          . 2018a.
          <article-title>Overview of the evalita 2018 task on automatic misogyny identification (ami)</article-title>
          .
          <source>In EVALITA Evaluation of NLP and Speech Tools for Italian: Proceedings of the Final Workshop 12-13 December</source>
          <year>2018</year>
          , Naples, pages
          <fpage>59</fpage>
          -
          <lpage>66</lpage>
          . Torino: Accademia University Press.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Paolo Rosso, and
          <string-name>
            <given-names>Maria</given-names>
            <surname>Anzovino</surname>
          </string-name>
          . 2018b.
          <article-title>Overview of the task on automatic misogyny identification at ibereval 2018</article-title>
          . In Workshop on Evaluation of
          <article-title>Human Language Technologies for Iberian Languages (IberEval</article-title>
          <year>2018</year>
          ), Sevilla, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Debora Nozza, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Ami @ evalita2020: Automatic misogyny identification</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Hutter</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Fixing weight decay regularization in adam</article-title>
          .
          <source>CoRR, abs/1711</source>
          .05101.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Marcin M. Miron</surname>
          </string-name>
          <article-title>´czuk</article-title>
          and
          <string-name>
            <given-names>Jarosław</given-names>
            <surname>Protasiewicz</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A recent overview of the state-of-the-art elements of text classification</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>106</volume>
          :
          <fpage>36</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Debora</given-names>
            <surname>Nozza</surname>
          </string-name>
          , Claudia Volpetti, and
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Unintended bias in misogyny detection</article-title>
          .
          <source>In IEEE/WIC/ACM International Conference on Web Intelligence</source>
          , pages
          <fpage>149</fpage>
          -
          <lpage>155</lpage>
          , Thessaloniki, Greece.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Polignano</surname>
          </string-name>
          , Pierpaolo Basile, Marco de Gemmis, Giovanni Semeraro, and
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets</article-title>
          .
          <source>In Proceedings of the Sixth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2019</year>
          ), volume
          <volume>2481</volume>
          ,
          <string-name>
            <surname>Bari</surname>
          </string-name>
          , Italy. CEUR.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Niloofar S. Samghabadi</surname>
          </string-name>
          , Parth Patwa,
          <string-name>
            <surname>Srinivas</surname>
            <given-names>PYKL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prerana</surname>
            <given-names>Mukherjee</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amitava Das</surname>
            , and
            <given-names>Thamar</given-names>
          </string-name>
          <string-name>
            <surname>Solorio</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Aggression and misogyny detection using BERT: A multi-task approach</article-title>
          .
          <source>In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020).</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Kalpana</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , Suprakash Chaudhury,
          <string-name>
            <given-names>P.S.</given-names>
            <surname>Bhat</surname>
          </string-name>
          , and
          <string-name>
            <surname>Samiksha</surname>
          </string-name>
          . Sahu.
          <year>2017</year>
          .
          <article-title>Misogyny, feminism, and sexual harassment</article-title>
          .
          <source>Industrial psychiatry journal</source>
          ,
          <volume>26</volume>
          (
          <issue>2</issue>
          ):
          <fpage>111</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Chi</given-names>
            <surname>Sun</surname>
          </string-name>
          , Xipeng Qiu, Yige Xu,
          <string-name>
            <given-names>and Xuanjing</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>How to fine-tune BERT for text classification? CoRR</article-title>
          , abs/
          <year>1905</year>
          .05583.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>