Vector Space Models for Automatic Misogyny Identification (Short Paper)

                                       Amir Bakarov
          National Research University Higher School of Economics, Moscow, Russia
                             amirbakarov at gmail.com


                     Abstract                            speech is the one that abuses a person’s gender
                                                         identity. Such form of hate speech is called misog-
    English. The problem of hate speech and,             ynous language since misogyny is a specific case
    especially, of misogynous language is one            of hate whose targets are women. Misogyny on
    of the most crucial problems of contem-              the Internet (cybermisogyny, or online sexual ha-
    porary Internet communities. Therefore,              rassment) is one of the crucial problems of con-
    automatic detection of such language be-             temporary Internet communities, especially from
    comes one of the most actual natural lan-            the perspective of the societal impact of this phe-
    guage processing tasks. The most ubiqui-             nomenon.
    tous tools for resolving this task are based
    on vector space models of texts. In this
                                                             Thus, the problem of automatic misogyny iden-
    paper we describe our system that ex-
                                                         tification could be considered as one of the most
    ploits such tools and have shown the best
                                                         important branches of a hate speech detection task.
    performance on the Italian AMI task of
                                                         The successful solution of this problem could lead
    EVALITA 2018.
                                                         to the significant limitation of the diffusion for the
    Italiano. Il problema dell’uso di dis-               hate speech against women. The problem of au-
    corsi che incitano l’odio, e specialmente            tomatic misogynous language detection got atten-
    dell’uso di linguaggio misogino, è uno              tion from the research community fairly recently,
    dei problemi più cruciali delle comunità           and the shared task on automatic misogyny identi-
    di internet al giorno d’oggi. Pertanto, il           fication held as a part of the EVALITA-2018 cam-
    rilevamento automatico di tale linguag-              paign is one of the first works trying to deal with
    gio diventa uno degli obiettivi più attuali         this problem (Fersini et al., 2018b). The aim of
    per l’elaborazione del linguaggio natu-              this task is to automatically identify misogynous
    rale. I sistemi più diffusi atti ad affrontare      content in tweets for the Italian and English lan-
    questo obiettivo sfruttano l’ipotesi dis-            guages.
    tributiva. In questo articolo, descriviamo
    il sistema proposto basato su quest’ipotesi             This paper describes our system that has outper-
    che hanno dimostrato le migliori perfor-             formed all other systems for the Italian language
    mance nel task AMI di EVALITA 2018                   and also has shown fairly good results for the En-
    nella lingua italiana.                               glish language. This system is based on using se-
                                                         mantic features of tweets as an input of a super-
                                                         vised classifier. The semantic features are consid-
1   Introduction                                         ered as latent vectors produced by a vector space
As the Internet community and several online dis-        model.
cussions grow, the number of manifestations of
hate speech on open web resources also increases.           Our work is organized as follows. Section 2
Such type of speech (also called abusive language        briefly describes related work on the proposed
or textual harassment) could get different forms         task. Section 3 describes the setup of our system,
depending on its focus on the person’s ethnic-           while Section 4 discusses the results and proposes
ity, gender identity, religion, or sexual orientation.   an analysis of them. Section 5 concludes the pa-
Probably, one of the most destructive forms of hate      per.
                         Task A (Italian)     Task B (Italian)   Task A (English)      Task B (English)
    Baseline                   0.830                0.487               0.605                0.370
    TFIDF+LR                   0.842                0.443               0.649                0.241
    TFIDF+XGB                  0.836                0.493               0.604                0.309
    TFIDF+SVD+LR               0.844                0.478               0.628                0.275
    TFIDF+SVD+XGB              0.833                0.463               0.605                0.254


    Table 1: Performance of each of the compared vectorizers and supervised classifiers on each of the
                    tasks. Task A reports accuracy, Task B reports macro F1-measure.


2     Related Work                                      and patterns appearing in this type of language
                                                        (Poland, 2016). We think that from the perspective
The first notorious works of the task of automatic      of natural language processing, such papers could
misogyny identification were described as shared        be useful for the systems that are highly grounded
task proposed at IverEval 2018 workshop (Fersini        to linguistic knowledge and manually crafted re-
et al., 2018a) (a shared task organized jointly with    sources.
SEPLN-2018 Conference for Iberian languages),
and SemEval-20191 . These tasks proposed certain        3   Experimental Setup
baselines based on ubiquitous text classification
                                                        In the shared task we had two datasets (for English
techniques (for example, SVM). The automatic
                                                        and for Italian) of 5000 tweets each. 4000 tweets
misogyny identification task considered in our re-
                                                        in each dataset were considered as a training sam-
search is the third shared task on this topic (An-
                                                        ple, and the evaluation of the system was done on
zovino et al., 2018). We are also aware of certain
                                                        1000 tweets (their labels were hidden until the end
other attempts to computationally resolve the task
                                                        of the competition). The classification task has in-
of automatic misogyny identification, but most of
                                                        cluded both binary and multi-label classification.
them were published only as some exploratory
                                                           In our work we have used vectors from term-
analysis (Hewitt et al., 2016). Most of the state-of-
                                                        document matrix with TF-IDF values. We pro-
the art approaches to this problem were described
                                                        pose the text classification based on using seman-
as system reports for the aforementioned IberEval-
                                                        tic features obtained from vector space models of
2018 shared task. As far as we know, there were
                                                        texts. We considered the terms as word n-grams,
no other scholarly works trying to resolve or to for-
                                                        and used a factorization of the term-document
malize this task.
                                                        matrix (we used a method of singular value de-
   In the natural language processing community
                                                        composition, SVD) and a normalization of fac-
very similar tasks were also considered in other
                                                        torized values (in the table with the results we
hate speech online challenges and scholarly works
                                                        call it TFIDF+SVD). From this perspective, our
(Davidson et al., 2017). An extensive overview
                                                        approach is very close to the method of Latent
of all the research related to hate speech detection
                                                        Semantic Analysis (Landauer et al., 1998) (and
goes beyond the scope of this work, and an inter-
                                                        we have also tried to resolve this task using not-
ested reader could be referred to a survey paper
                                                        factorized TF-IDF matrix, called TFIDF in the ta-
specialized on this topic (Schmidt and Wiegand,
                                                        ble). As a supervised classifier we have used a Lo-
2017).
                                                        gistic Regression classifier, therefore, our system
   Apart from computational linguistics and natu-
                                                        is based on using TF-IDF n-gram word features
ral language processing, the problem of misogy-
                                                        and a Logistic Regression (LR).
nous speech was also a focus of some linguistic
                                                           For all the methods of vectorization we used
and social science articles (Fulper et al., 2014).
                                                        a basic pipeline of text pre-processing (tokeniza-
Most of such scholarly works were trying to un-
                                                        tion, lemmatization and stop-word removal based
derstand the nature of misogynous hate speech
                                                        on NLTK build-in tools and resources).
  1
    https://competitions.codalab.org/                      We have also compared it with other classi-
competitions/19935                                      fiers (for instance, a Gradient Boosting classifier,
XGB in the table) and got worse results on the           terns that people tend to use in misogynous lan-
certain tasks. All in all, we have compared four         guage. We would also like to try out more promis-
different models. The exact hyperparameters of           ing approaches to text classification based on deep
the models used in our system, and all the code          learning (for example, convolutional neural net-
for reproducing the experiments could be found           works).
at our Gitlab repository: https://gitlab.
com/bakarov/ami-evalita.
                                                         References
4   Results and Discussion                               Anzovino, M., Fersini, E., and Rosso, P. (2018). Auto-
                                                           matic identification and classification of misogynis-
The system evaluation was done on two subtasks.            tic language on twitter. In International Conference
The first subtask had proposed a binary classifica-        on Applications of Natural Language to Information
tion to identify whether the text is either misog-         Systems, pages 57–64. Springer.
ynous or not misogynous (Task A). The second             Davidson, T., Warmsley, D., Macy, M., and Weber,
subtask (Task B) was to classify the misogynous            I. (2017). Automated hate speech detection and
tweets according to both the misogynistic behav-           the problem of offensive language. arXiv preprint
ior (multi-label classification) and the target of the     arXiv:1703.04009.
message (binary-classification). The results of the      Fersini, E., Anzovino, M., and Rosso, P. (2018a).
system for the English and Italian subtasks for the        Overview of the task on automatic misogyny iden-
misogyny identification task are described in Ta-          tification at ibereval. In Proceedings of the Third
                                                           Workshop on Evaluation of Human Language Tech-
ble 1. It is notable that our system has outper-
                                                           nologies for Iberian Languages (IberEval 2018),
formed the baseline put by organizers in most of           co-located with 34th Conference of the Spanish
the cases, and different combinations of vectoriz-         Society for Natural Language Processing (SEPLN
ers and models have shown different performance            2018). CEUR Workshop Proceedings. CEUR-WS.
in different tasks.                                        org, Seville, Spain.
   After an error analysis conducted on the system,      Fersini, E., Nozza, D., and Rosso, P. (2018b).
we have found out that the system fails on exam-           Overview of the evalita 2018 task on automatic
ples where misogyny is expressed without (or with          misogyny identification (ami).      In Caselli, T.,
                                                           Novielli, N., Patti, V., and Rosso, P., editors, Pro-
a very little use of) offensive lexis, or, vice versa,     ceedings of the 6th evaluation campaign of Natural
such lexis is used not in misogynous context (for          Language Processing and Speech tools for Italian
example, you pussy boy). This could be explained           (EVALITA’18), Turin, Italy. CEUR.org.
by the fact that the system is too much focused on
                                                         Fulper, R., Ciampaglia, G. L., Ferrara, E., Ahn, Y.,
the lexicon and does not takes into account syntac-        Flammini, A., Menczer, F., Lewis, B., and Rowe, K.
tic patterns or thematic roles.                            (2014). Misogynistic language on twitter and sexual
                                                           violence. In Proceedings of the ACM Web Science
5   Conclusions                                            Workshop on Computational Approaches to Social
                                                           Modeling (ChASM).
The proposed work has described the system that
                                                         Hewitt, S., Tiropanis, T., and Bokhove, C. (2016). The
has shown the best results for the Italian track on        problem of identifying misogynist language on twit-
all the subtasks (and have also got fairly good re-        ter (and other online social spaces). In Proceedings
sults on English). Our system is based on a vector         of the 8th ACM Conference on Web Science, pages
space model of character n-grams and a supervised          333–335. ACM.
gradient boosting classifier.                            Landauer, T. K., Foltz, P. W., and Laham, D. (1998).
   The system described in this paper is one of the        An introduction to latent semantic analysis. Dis-
first attempts to the problem of detecting misogy-         course processes, 25(2-3):259–284.
nistic language for the Italian language in the natu-    Poland, B. (2016). Haters: Harassment, abuse, and
ral language processing community. We think that           violence online. U of Nebraska Press.
the description of the implementation of our sys-
                                                         Schmidt, A. and Wiegand, M. (2017). A survey on hate
tem could help other researchers to resolve such
                                                           speech detection using natural language processing.
important and actual task. We consider this value          In Proceedings of the Fifth International Workshop
as a main contribution of our research.                    on Natural Language Processing for Social Media,
   In future we plan to give more attention to some        pages 1–10.
other linguistic features based on analysis of pat-