=Paper= {{Paper |id=Vol-2263/paper033 |storemode=property |title=Automatic Identification of Misogyny in English and Italian Tweets at EVALITA 2018 with a Multilingual Hate Lexicon |pdfUrl=https://ceur-ws.org/Vol-2263/paper033.pdf |volume=Vol-2263 |authors=Endang Wahyu Pamungkas,Alessandra Teresa Cignarella,Valerio Basile,Viviana Patti |dblpUrl=https://dblp.org/rec/conf/evalita/PamungkasCBP18 }} ==Automatic Identification of Misogyny in English and Italian Tweets at EVALITA 2018 with a Multilingual Hate Lexicon== https://ceur-ws.org/Vol-2263/paper033.pdf
    Automatic Identification of Misogyny in English and Italian Tweets at
             EVALITA 2018 with a Multilingual Hate Lexicon
       Endang Wahyu Pamungkas1 , Alessandra Teresa Cignarella1,2 , Valerio Basile1
                                    and Viviana Patti1
              1
                Dipartimento di Informatica, Università degli Studi di Torino
              2
                PRHLT Research Center, Universitat Politècnica de València
                        {pamungka | cigna | basile | patti}@di.unito.it



                     Abstract                              Misogyny can be linguistically manifested in
                                                        numerous ways, including social exclusion, dis-
    English. In this paper we describe our
                                                        crimination, hostility, threats of violence and sex-
    submission to the shared task of Auto-
                                                        ual objectification (Anzovino et al., 2018). Many
    matic Misogyny Identification in English
                                                        Internet companies and micro-blogs already tried
    and Italian Tweets (AMI) organized at
                                                        to tackle the problem of blocking this kind of
    EVALITA 2018. Our approach is based on
                                                        online contents, but, unfortunately, the issue is
    SVM classifiers and enhanced by stylistic
                                                        far from being solved because of the complexity
    and lexical features. Additionally, we an-
                                                        of the natural language1 (Schmidt and Wiegand,
    alyze the use of the novel HurtLex mul-
                                                        2017). For the above-mentioned reasons, it has be-
    tilingual linguistic resource, developed by
                                                        come necessary to implement targeted NLP tech-
    enriching in a computational and multilin-
                                                        niques that can be automated to treat hate speech
    gual perspective of the hate words Italian
                                                        online and misogyny.
    lexicon by the linguist Tullio De Mauro, in
                                                           The first shared task specifically aimed at Au-
    order to investigate its impact in this task.
                                                        tomatic Misogyny Identification (AMI) took place
    Italiano. Nel presente lavoro descrivi-             at IberEval 20182 within SEPLN 2018 considering
    amo il sistema inviato allo shared task di          English and Spanish tweets (Fersini et al., 2018a).
    Automatic Misogyny Identification (AMI)             Hence, the aim of the proposed shared task is
    ad EVALITA 2018. Il nostro approc-                  to encourage participating teams in proposing the
    cio si basa su classificatori SVM, ottimiz-         best automatic system firstly to distinguish misog-
    zati da feature stilistiche e lessicali. In-        ynous and non-misogynous tweets, and secondly
    oltre, analizziamo il ruolo della nuova             to classify the type of misogynistic behaviour and
    risorsa linguistica HurtLex, un’estensione          judge whether the target of the misogynistic be-
    in prospettiva computazionale e multi-              haviour is a specific woman or a group of women.
    lingue del lessico di parole per ferire in          In this paper, we describe our submission to the
    italiano proposto dal linguista Tullio De           2nd shared task of Automatic Misogyny Identifi-
    Mauro, per meglio comprendere il suo im-            cation (AMI)3 organized at EVALITA 2018, orga-
    patto in questo tipo di task.                       nized in the same manner but focusing on Italian
                                                        tweets, rather than Spanish and English as in the
1   Introduction                                        IberEval task.
Hate Speech (HS) can be based on race, skin color,
ethnicity, gender, sexual orientation, nationality,     2       Task Description
or religion, it incites to violence and discrimina-
                                                        The aim of the AMI task is to detect misogy-
tion, abusive, insulting, intimidating, and harass-
                                                        nous tweets written in English and Italian (Task
ing. Hateful language is becoming a huge prob-
                                                        A) (Fersini et al., 2018b). Furthermore, in Task
lem in social media platforms such as Twitter and
Facebook (Poland, 2016). In particular, a type              1
                                                            https://www.nytimes.com/2013/05/29/
of cyberhate that is increasingly worrying nowa-        business/media/facebook-says-it-failed-
                                                        to-stop-misogynous-pages.html
days is the use of hateful language that specifically     2
                                                            https://sites.google.com/view/
targets women, which is normally referred to as:        ibereval-2018
                                                          3
MISOGYNY (Bartlett et al., 2014).                           https://amievalita2018.wordpress.com/
B, each system should also classify each misog-        bigrams as a representation of the tweet. In ad-
ynous tweet into one of five different misogyny        dition, we also employed Bag of Hashtags (BoH)
behaviors (STEREOTYPE , DOMINANCE , DERAIL -           and Bag of Emojis (BoE) features, which are built
ING , SEXUAL HARASSMENT, AND DISCREDIT )               by using the same technique as BoW, focusing on
and two targets of misogyny classes (active and        the presence of hashtags and emojis.
passive). Participants are allowed to submit up to     Swear Words. This feature takes into account the
three runs for each language. Table 1 shows the        presence of a swear word and the number of its oc-
dataset label distribution for each class. Accuracy    currences in the tweet. For English, we took a list
will be used as an evaluation metric for Task A,       of swear words from www.noswearing.com,
while macro F -score is used for Task B.               while for Italian we gathered the swear word list
   The organizers provided the same amount of          from several sources5 including a translated ver-
data for both languages: 4,000 tweets in the train-    sion of www.noswearing.com’s list and a list
ing set and 1,000 in the test set. The label distri-   of swear words from Capuano (2007).
bution for Task A is balanced, while in Task B the
distribution is highly unbalanced for both misog-      Sexist Slurs. Beside swear words, we also con-
yny behaviors and targets.                             sidered sexist words, that are specifically target-
                                                       ing women. We used a small set of sexist slurs
3       Description of the System                      from previous work by Fasoli et al. (2015). We
                                                       translated and expanded that list manually for our
We used two Support Vector Machine (SVM) clas-         Italian systems. This feature has a binary value, 1
sifiers which exploit different kernels: linear and    when at least one sexist slur presence on tweet and
radial basis function (RBF) kernels.                   0 when there is no sexist slur on tweet.
SVM with Linear Kernel. Linear kernel was
used to find the optimal hyperplane when SVM           Women Words. We manually built a small set of
was firstly introduced in 1963 by Vapnik et al.,       words containing synonyms and several words re-
long before Cortes and Vapnik (1995) proposed          lated to word “woman" in English and “donna" in
to use the kernel trick. Joachims (1998) recom-        Italian. Based on our previous work (Pamungkas
mends to use linear kernel for text classification,    et al., 2018), these words were effective to de-
based on the observation that text representation      tect the target of misogyny on the tweet. Simi-
features are frequently linearly separable.            lar to sexist slur feature, this feature also has bi-
SVM with RBF Kernel. Choosing the kernel               nary value show the presence of women words on
is usually a challenging task, because its perfor-     tweet.
mance will be dataset dependent. Therefore, we         Surface Features. We also considered several
also experimenteed with a Radial Basis Function        surface level features including: upper case char-
(RBF) kernel, which has been already proven as         acter count, number of hashtags, number of
an effective classifier in text classification prob-   URLs, and the length of the tweet counting the
lems. The drawback of RBF kernels is that they         characters.
are computationally expensive and obtain a worse
                                                       Hate Words Lexicon. HurtLex (Bassignana et
performance in big and sparse feature matrices.
                                                       al., 2018) is a multilingual lexicon of hate words,
3.1      Features                                      built starting from a list of words compiled man-
                                                       ually (De Mauro, 2016). The lexicon is semi-
We employed several lexical features, performing
                                                       automatically translated into 53 languages, and the
a simple preprocessing step including tokeniza-
                                                       lexical items are divided into 17 categories (see
tion and stemming, using the NLTK (Natural Lan-
                                                       Table 2). For our system configuration, we ex-
guage Toolkit) library4 . A detailed description of
                                                       ploited the presence of the words in each category
the features employed by our model follows.
                                                       as a single feature, thus obtaining 17 single fea-
Bag of Words (BoW). We used bags of words              tures, one for each HurtLex category.
in order to build the tweets representation. Be-
fore producing the word vector, we changed all
the characters from upper to lower case. Our vec-        5
                                                           https://www.parolacce.org/2016/12/
tor space consists of the count of unigrams and        20/dati-frequenza-turpiloquio/ and https:
                                                       //it.wikipedia.org/wiki/Turpiloquio_
    4
        https://www.nltk.org/                          nella_lingua_italiana
                Task A                                       Task B
                                      English      Italian                               English        Italian
                                                             Stereotype                  179/140       668/175
                                                             Dominance                   148/124          71/61
                Misogynistic         1,785/460   1,828/512   Derailing                     92/11           24/2
                                                             Sexual Harassment            352/44       431/170
                                                             Discredit                 1,014/141       634/104
                                                             Active                    1,058/401     1,721/446
                                                             Passive                      727/59          96/66
                Not misogynistic     2,215/540   2,172/488   No class                  2,215/540     2,172/488
                Total                                                                4,000/1,000   4,000/1,000

                                   Table 1: Dataset label distribution (training/test).

    Category    Description                                     new systems specifically for Task B.
    PS          Ethnic Slurs
    RCI         Location and Demonyms
    PA          Profession and Occupation
                                                                   We experimented with different selections of
    DDP         Physical Disabilities and Diversity             categories from the HurtLex lexicon, and identi-
    DDF         Cognitive Disabilities and Diversity            fied the most useful for the purpose of misogyny
    DMC         Moral Behavior and Defect
    IS          Words Related to Social and Economic antage     identification. As it can be seen in Table 3, the
    OR          Words Related to Plants                         main categories are: physical disabilities and di-
    AN          Words Related to Animals                        versity (DDP), words related to prostitution (PR),
    ASM         Words Related to Male Genitalia
    ASF         Words Related to Female Genitalia               words referring to male genitalia (ASM) and fe-
    PR          Words Related Prostitution                      male genitalia (ASF). But also: derogatory words
    OM          Words Related Homosexuality                     (CDS), words related to felonies and crime, and
    QAS         Descriptive Words with Potential Negative
                Connotations                                    also immoral behavior (RE).
    CDS         Derogatory Words
    RE          Felonies and Words Related to Crime and Im-
                moral Behavior
    SVP         Words Related to the Seven Deadly Sins of the    Language                  English          Italian
                Christian Tradition                              Systems               run1 run2 run3 run1 run2 run3
                                                                 Accuracy              0.765 0.72 0.744 0.786 0.893 0.893
                                                                 Bag of Word            -     X    -     -     X     X
               Table 2: HurtLex Categories.                      Bag of Hashtags        -     -    -     -     -     X
                                                                 Bag of Emojis          -     -    -     -     -     X
                                                                 S.W. Count             X     -    X     X     -     -
                                                                 S.W. Presence          X     -    X     X     -     -
4     Experimental Setup                                         Sexist Slurs           X     X    X     X     X     -
                                                                 Woman Word             X     X    X     X     X     -
We experimented with different sets of features                  Hashtag                -     -    X     -     X     -
                                                                 Link Presence          X     X    X     -     -     -
and kernels to find the best configuration of the                Upper        Case      X     -    -     X     X     -
two SVM classifiers (one for each language of the                Count
                                                                 Text Length            -  X       -  X    -      -
task). A 10-fold cross validation was carried out to
                                                                 ASF Count             X   X       -  X   X      X
tune our systems based on accuracy. Our submit-                  PR Count               -   -      -  X   X      X
ted systems configuration can be seen in Table 3.                OM Count              X   X       -   -   -      -
                                                                 DDF Count              -   -      -   -   -      -
   Run #3 for both languages uses the same con-                  CDS Count             X   X       -  X   X       -
figuration of our best system at the IberEval task.              DDP Count             X   X       -   -   -     X
                                                                 AN Count              X   X       -   -   -      -
(Fersini et al., 2018a).                                         ASM Count              -   -      -  X   X       -
   The best result on the English training set has               DMC Count              -   -      -   -   -      -
                                                                 IS Count              X   X       -   -   -      -
been obtained by run #1, where we used the RBF                   OR Count               -   -      -   -   -      -
kernel (0.765 accuracy), while for Italian the best              PA Count              X   X       -   -   -      -
                                                                 PS Count               -   -      -   -   -      -
result has been obtained by runs #2 and #3 with                  QAS Count              -   -      -   -   -      -
the Linear kernel (0.893 accuracy). Different sets               RCI Count              -   -      -   -   -      -
                                                                 RE Count               -   -      -  X   X       -
of categories from HurtLex were able to improve                  SVP Count              -   -      -   -   -      -
the classifier performance, depending on the lan-                Kernel                RBF Linear RBF RBF Linear Linear
guage.                                                          Table 3: Feature Selection for all the submitted
   In order to classify the category and target of              systems.
misogyny (Task B), we adopted the same set of
features as Task A. Therefore, we did not build
                                                                              ITALIAN
5    Results
                                                          Rank    Team               Avg.    Cat.    Targ.
                                                          1       bakarov.c.run1     0.493   0.555   0.432
Table 4 shows our system performance based on             2       AMI-BASELINE       0.487   0.534   0.440
the test sets. Our best system in Task A ranked 3rd       3       14-exlab.c.run3    0.485   0.552   0.418
in Italian (0.839 in accuracy for run3) and 13th          4       14-exlab.c.run2    0.482   0.550   0.415
                                                          5       bakarov.c.run3     0.478   0.536   0.421
in English (0.621 in accuracy for run3). Interest-        6       bakarov.c.run2     0.463   0.499   0.426
ingly, our best result on both languages were ob-         7       SB.c.run.tsv       0.449   0.485   0.414
tained by the best configuration submitted at the         8       SB.c.run1.tsv      0.448   0.483   0.414
                                                          9       RCLN.c.run1        0.448   0.473   0.422
IberEval campaign. However, our English system            10      SB.c.run2.tsv      0.446   0.480   0.411
performance was way worse compared to the re-             11      14-exlab.c.run1    0.292   0.164   0.420
sult of IberEval (accuracy = 0.814). We will try to
                                                                             ENGLISH
analyze this problem in the Section 6.                    Rank   Team                Avg.    Cat.    Targ.
                                                          1      himani.c.run3.tsv   0.406   0.361   0.451
                ITALIAN                                   2      himani.c.run2.tsv   0.377   0.323   0.431
     Rank      Team                     Accuracy          3      AMI-BASELINE        0.370   0.342   0.399
     1         bakarov.c.run2           0.844             4      hateminers.c.run3   0.369   0.302   0.435
     2         bakarov.c.run1           0.842             5      hateminers.c.run1   0.348   0.264   0.431
     3         14-exlab.c.run3          0.839             6      SB.c.run2.tsv       0.344   0.282   0.407
     4         bakarov.c.run3           0.836             7      himani.c.run1.tsv   0.342   0.280   0.403
     5         14-exlab.c.run2          0.835             8      SB.c.run1.tsv       0.335   0.282   0.389
     6         StopPropagHate.c.run1    0.835             9      hateminers.c.run2   0.329   0.229   0.430
     7         AMI-BASELINE             0.830             10     SB.c.run3.tsv       0.328   0.269   0.387
     8         StopPropagHate.u.run2    0.829             11     resham.c.run2       0.322   0.246   0.399
     9         SB.c.run1                0.824             12     resham.c.run1       0.316   0.235   0.397
     10        RCLN.c.run1              0.824             13     bakarov.c.run1      0.309   0.260   0.357
     11        SB.c.run3                0.823             14     resham.c.run3       0.283   0.214   0.353
     12        SB.c.run                 0.822             15     RCLN.c.run1         0.280   0.165   0.395
                                                          16     ITT.c.run2.tsv      0.276   0.173   0.379
               ENGLISH                                    17     bakarov.c.run2      0.275   0.176   0.374
    Rank    Team                         Accuracy         18     14-exlab.c.run1     0.260   0.124   0.395
    1       hateminers.c.run1            0.704            19     bakarov.c.run3      0.254   0.151   0.356
    2       hateminers.c.run3            0.681            20     14-exlab.c.run3     0.239   0.107   0.371
    3       hateminers.c.run2            0.673            21     ITT.c.run1.tsv      0.238   0.140   0.335
    4       resham.c.run3                0.651            22     ITT.c.run3.tsv      0.237   0.138   0.335
    5       bakarov.c.run3               0.649            23     14-exlab.c.run2     0.232   0.205   0.258
    6       resham.c.run1                0.648
    7       resham.c.run2                0.647              Table 5: Official Results for Subtask B.
    8       ITT.c.run2.tsv               0.638
    9       ITT.c.run1.tsv               0.636
    10      ITT.c.run3.tsv               0.636         around 0.5 for Italian). Several under-represented
    11      himani.c.run2.tsv            0.628
    12      bakarov.c.run2               0.628         classes such as DERAILING and DOMINANCE are
    13      14-exlab.c.run3              0.621         very difficult to be detected in category classifica-
    14      himani.c.run1.tsv            0.619         tion (See Table 1 for details). Similarly, the label
    15      himani.c.run3.tsv            0.614
    16      14-exlab.c.run1              0.614         distribution was very unbalanced for target classi-
    17      SB.c.run2.tsv                0.613         fication, where most of the misogynous tweets are
    18      bakarov.c.run1               0.605         attacking a specific target (ACTIVE).
    19      AMI-BASELINE                 0.605
    20      StopPropagHate.c.run1.tsv    0.593            Several features which focus on the use of of-
    21      SB.c.run1.tsv                0.592         fensive words were proven to be useful in English.
    22      StopPropagHate.u.run3.tsv    0.591
    23      StopPropagHate.u.run2.tsv    0.590
                                                       For Italian, a simple tweet representation which
    24      RCLN.c.run1                  0.586         involves Bag of Words, Bag of Hashtags, and Bag
    25      SB.c.run3.tsv                0.584         of Emojis already produced a better result than
    26      14-exlab.c.run2              0.500
                                                       the baseline. Some of the HurtLex categories that
      Table 4: Official Results for Subtask A.         were improving the system’s performance during
                                                       training did not help the prediction on the test set
In Task B, most of the submitted systems struggled     (ASF, OM, CDS, DDP, AN, IS, PA for English and
to classify the misogynous tweets into the five cat-   CDS, ASM for Italian). However, similarly to the
egories and discriminate whether the target is ac-     Spanish case, the system configuration which uti-
tive or passive. Both subtasks for both languages      lized ASF, PR, and DDP obtained the best result
have very low baselines (below 0.4 for English and     in Italian.
6   Discussion                                          According to task guidelines this should not be la-
                                                        beled as a misogynistic tweet, because it is not
We performed an error analysis on the gold stan-
                                                        the user himself who is misogynistic. Therefore,
dard test set, and analyzed 160 Italian tweets that
                                                        instances of this type tend to confuse a classifier
our best system configuration mislabelled. The la-
                                                        based on lexical features.
bel “misogynistic” was wrongly assigned to 147
                                                        Irony and world knowledge. In Example 3, the
instances (false positives, 91.9% of the errors),
                                                        sentence “Potrei morire per il dispiacere.”6 is
while the contrary happened only 13 times (false
                                                        ironic. Humor is very hard to model for automatic
negatives, 8.1% of the errors). The same situation
                                                        systems — sometimes, the presence of figurative
happened in the English dataset, but with a less
                                                        language even baffles human annotators. More-
striking impact, with 228 false positives (60.2% of
                                                        over, external world knowledge is often required
the errors), 151 false negatives (39.8% of the er-
                                                        in order to infer whether an utterance is ironic
rors). In this section we conduct a qualitative error
                                                        (Wallace et al., 2014).
analysis, identifying and discussing several factors
                                                        Preprocessing and tokenization. In computer-
that contribute to the misclassification.
                                                        mediated communication, and specifically on
Presence of swear words. We encountered a lot
                                                        Twitter, users often resort to a language type that
of “bad words” in the dataset of this shared task
                                                        is closer to speech, rather than written language.
for both English and Italian. In case of abusive
                                                        This is reflected in less-than-clean orthography,
context, the presence of swear words can help to
                                                        with forms and expressions that imitate the verbal
spot abusive content such as misogyny. However,
                                                        face-to-face conversation.
they could also lead to false positives when the
swear word is used in a casual, not offensive con-             4.      @ XXXXXXXXX @ XXXXXXXXXX
text (Malmasi and Zampieri, 2018; Van Hee et                   @ XXXXXXX @ XXXXXX x me glob
                                                               prox2aa colpiran tutti incluso nemicinterno..
al., 2018; Nobata et al., 2016). Consider the fol-             esterno colpopiúduro saràculogrande che
lowing two examples containing the swear word                  bevetropvodka e inoltre x questiondisoldi
“bitch" in different contexts:                                 progetta farmezzofallirsudfinitestampe: ciò
                                                               nnvàben xrchèindebolis
     1.   Im such a fucking cunt bitch and i dont              → 4 me glob next2aa will hit everyone included
     even mean to be goddammit                                 internalenemy.. external harderhit willbebigass
                                                               who drinkstoomuchvodka and also 4 mattersof-
     2.    Bitch you aint the only one who hate                money isplanning tomakethesouthfailwithprint-
     me, join the club, stand in the corner, and               ings: dis notgood causeweaken
     stfu.
                                                        In Example 4, preprocessing steps like tokeniza-
   In Example 1, the swear word “bitch" is used         tion and stemming are particularly hard to per-
just to arouse interest/show off, thus not directly     form, because of the lack of spaces between one
insulting the other person. This is a case of id-       word and the other and the confused orthogra-
iomatic swearing (Pinker, 2007). In Example 2,          phy. Consequently all the classification pipeline
the swear word “bitch" is used to insult a specific     is compromised and error-prone.
target in an abusive context, an instance of abusive    Gender of the target. As defined in the Intro-
swearing (Pinker, 2007). Resolving swearing con-        duction, we know that misogyny is a specific type
text is still a challenging task for automatic system   of hateful language, targeting women. However,
which contributing to the difficulties of this task.    detecting the gender of the target is a challenging
Reported speech. Tweets may contain misog-              task in itself, especially in Twitter datasets.
ynistic content as an indirect quote of someone
                                                               5.    @realDonaldTrump shut the FUCK up
else’s words, such as in the following example:                you infected pussy fungus.

     3. Quella volta che mia madre mi ha detto                 6.     @TomiLahren You’re a fucking skank!
     quella cosa le ho risposto "Mannaggia! Non           Both examples use bad words to abuse their tar-
     sarò mai una brava donna schiava zitta e
     lava! E adesso?!" Potrei morire per il dispi-      gets. However, the first example is labeled as not
     acere.                                             misogyny since the target is Donald Trump (man),
     → That time when my mom told me that thing         while the second example is labeled as misogyny
     and I answered “Holy s**t! I will never be         with the target Tomi Lahren (woman).
     a good slave who shuts up and cleans! What
                                                          6
     now?”                                                    Translation: I could die for heartbreak.
7   Conclusions                                           Elisabetta Fersini, Maria Anzovino, and Paolo Rosso.
                                                             2018a. Overview of the Task on Automatic Misog-
Here we draw some considerations based on the                yny Identification at IberEval. In Proceedings of 3rd
results of our participation to the EVALITA 2018             Workshop on Evaluation of Human Language Tech-
AMI shared task. In order to test the multi-                 nologies for Iberian Languages (IberEval 2018)),
                                                             pages 57–64. CEUR-WS.org, September.
lingual potential of our model, one of the sys-
tems we submitted for Italian at EVALITA (run             Elisabetta Fersini, Debora Nozza, and Paolo Rosso.
#3) was based on our best model for Spanish at               2018b. Overview of the evalita 2018 task on au-
                                                             tomatic misogyny identification (ami). In Proceed-
IberEval. Based on the official results, this system
                                                             ings of the 6th evaluation campaign of Natural
performed well for Italian, consisting of features           Language Processing and Speech tools for Italian
such as: BoW, BoE, BoH and several HurtLex                   (EVALITA’18), Turin, Italy. CEUR.org.
categories specifically related to the hate against
                                                          Thorsten Joachims. 1998. Text categorization with
women. Concerning English, we obtained lower                support vector machines: Learning with many rel-
results in EVALITA in comparison to IberEval                evant features. In European conference on machine
with the same system configuration. It is worth             learning, pages 137–142. Springer.
mentioning that even if the training set for the AMI
                                                          Shervin Malmasi and Marcos Zampieri. 2018. Chal-
EVALITA task was substantially bigger, in abso-             lenges in discriminating profanity from hate speech.
lute terms all the AMI’s participants at EVALITA            Journal of Experimental & Theoretical Artificial In-
obtained worse scores than the ones obtained by             telligence, 30(2):187–202.
the IberEval’s teams.                                     Chikashi Nobata, Joel Tetreault, Achint Thomas,
                                                            Yashar Mehdad, and Yi Chang. 2016. Abusive lan-
Acknowledgments                                             guage detection in online user content. In Proceed-
                                                            ings of the 25th international conference on world
Valerio Basile and Viviana Patti were partially             wide web, pages 145–153.
supported by Progetto di Ateneo/CSP 2016 (Im-
migrants, Hate and Prejudice in Social Media-             Endang Wahyu Pamungkas, Alessandra Teresa
                                                            Cignarella, Valerio Basile, and Viviana Patti. 2018.
IhatePrejudice, S1618_L2_BOSC_01).                          14-ExLab@ UniTo for AMI at IberEval2018:
                                                            Exploiting Lexical Knowledge for Detecting
                                                            Misogyny in English and Spanish Tweets. In Proc.
References                                                  of 3rd Workshop on Evaluation of Human Language
Maria Anzovino, Elisabetta Fersini, and Paolo Rosso.        Technologies for Iberian Languages (IberEval
 2018. Automatic Identification and Classification of       2018).
 Misogynistic Language on Twitter. In Proc. of the        Steven Pinker. 2007. The stuff of thought: Language
 23rd Int. Conf. on Applications of Natural Language         as a window into human nature. Penguin.
 & Information Systems, pages 57–64. Springer.
                                                          Bailey Poland. 2016. Haters: Harassment, Abuse, and
Jamie Bartlett, Richard Norrie, Sofia Patel, Rebekka
                                                            Violence Online. Potomac Press.
  Rumpel, and Simon Wibberley. 2014. Misogyny on
  twitter. Demos.                                         Anna Schmidt and Michael Wiegand. 2017. A survey
Elisa Bassignana, Valerio Basile, and Viviana Patti.        on hate speech detection using natural language pro-
   2018. Hurtlex: A Multilingual Lexicon of Words to        cessing. In Proceedings of the Fifth International
   Hurt. In Proc. of the 5th Italian Conference on Com-     Workshop on Natural Language Processing for So-
   putational Linguistics (CLiC-it 2018), Turin, Italy.     cial Media, pages 1–10.
   CEUR.org.                                              Cynthia Van Hee, Gilles Jacobs, Chris Emmery, Bart
Romolo Giovanni Capuano. 2007. Turpia: sociologia           Desmet, Els Lefever, Ben Verhoeven, Guy De Pauw,
  del turpiloquio e della bestemmia. Riscontri (Mi-         Walter Daelemans, and Véronique Hoste. 2018.
  lano, Italia). Costa & Nolan.                             Automatic detection of cyberbullying in social me-
                                                            dia text. arXiv preprint arXiv:1801.05617.
Corinna Cortes and Vladimir Vapnik. 1995. Support-
  vector networks. Machine learning, 20(3):273–297.       Byron C. Wallace, Laura Kertz, Eugene Charniak, et al.
                                                            2014. Humans require context to infer ironic in-
Tullio De Mauro. 2016. Le parole per ferire. Inter-         tent (so computers probably do, too). In Proceed-
  nazionale. 27 settembre 2016.                             ings of the 52nd Annual Meeting of the Association
                                                            for Computational Linguistics (Volume 2: Short Pa-
Fabio Fasoli, Andrea Carnaghi, and Maria Paola Pal-         pers), volume 2, pages 512–516.
  adino. 2015. Social acceptability of sexist deroga-
  tory and sexist objectifying slurs across contexts.
  Language Sciences, 52:98–107.