=Paper= {{Paper |id=Vol-2263/paper030 |storemode=property |title=Misogyny Detection and Classification in English Tweets: The Experience of the ITT Team |pdfUrl=https://ceur-ws.org/Vol-2263/paper030.pdf |volume=Vol-2263 |authors=Elena Shushkevich,John Cardiff |dblpUrl=https://dblp.org/rec/conf/evalita/ShushkevichC18 }} ==Misogyny Detection and Classification in English Tweets: The Experience of the ITT Team== https://ceur-ws.org/Vol-2263/paper030.pdf
    Misogyny Detection and Classification in English Tweets:
              The Experience of the ITT Team




              Elena Shushkevich                                 John Cardiff
        Social Media Research Group                    Social Media Research Group
       Institute of Technology Tallaght               Institute of Technology Tallaght
                 Dublin, Ireland                                Dublin, Ireland
      e.shushkevich@yandex.ru                      john.cardiff@it-tallaght.ie




                                                 Il nostro metodo è stato presentato at-
              Abstract                           traverso la nostra partecipazione allo
                                                 shared task AMI presso la campagna di
English. The problem of online misog-            valutazione EVALITA 2018.
yny and women-based offending has be-
come increasingly widespread, and the
automatic detection of such messages is      1   Introduction
an urgent priority. In this paper, we pre-
sent an approach based on an ensemble        It is hard to miss the fact that an intensive
of Logistic Regression, Support Vector       growth of social networking has led not only
Machines, and Naïve Bayes models for         to the rise of personal communication oppor-
the detection of misogyny in texts ex-       tunities, but also to an increase in aggres-
tracted from the Twitter platform. Our       sion on social media. Hate speech can be
method has been presented in the frame-      aimed at sexual orientation, race, religion as
work of the participation in the Auto-       gender as a whole. In particular, when the tar-
matic Misogyny Identification (AMI)          get of hate speech is women, we could say
Shared Task in the EVALITA 2018              that this is misogyny. Nowadays, more and
evaluation campaign.                         more attention is paid to this problem, and
                                             one of the directions for the hate speech
Italiano. Il problema della misoginia        recognition is the women-oriented aggression
online e dell'odio diretto verso le donne    detection in social networks.
si sta diffondendo sempre più, e così il        It is important to work with hate speech
riconoscimento automatico di tali mes-       and misogyny detection now, because over
saggi è una priorità importante.             the course of time the data from social net-
In questo articolo, presentiamo un ap-       works will grow and this problem will be-
proccio basato sui classificatori Lo-        come more and more serious. It is necessary
gistic Regression, SVM e Naive Bayes         to create a range of systems which allow us to
per il riconoscimento automatico della       detect and control the number of hate speech
misoginia in testi estratti da Twitter.
                                             messages, and we need to understand how to
                                             classify this type of information and how we
could reduce the number of it. So, it is a big        This paper presents our approach to solve
challenge to find the way of misogyny data         the        above          problems.       The
detection and processing.                          main thrust of our approach is to build a
    This paper describes our participation in      model that allows us to assess the classifica-
the Automatic       Misogyny      Identification   tion of any tweet to its assigned group.
(AMI) Shared Task, in EVALITA 2018 (Fer-              The paper is organized as follows.
sini, Nozza and Rosso, 2018). The aim of the       Some relevant related works in the area are
task is to identify misogynistic text in tweets.   described in Section 2. Section 3 presents the
The task contained two different subtasks:         way we conducted data preprocessing and the
Subtask A - Misogyny Identification: the           approach we chose for building the desired
main goal of the task was to separate misogy-      model. In Section 4 the results are described
nous tweets from non-misogynous.                   and analyzed. In Section 5 we summarize our
Subtask B - Misogynistic Behavior and Tar-         work.
get Classification: the idea of the target clas-
sification was to define misogynous tweet          2   Related work
which offends a specific person (Active) and       There are a number of approaches in the area
tweets which insult a group of people (Pas-        of text processing by machine learning meth-
sive).                                             ods which allow us to deal with misogyny and
Misogynistic behavior task was intended to         harassment in texts. Some of these were pre-
divide misogynous tweets into different            sented in              the AMI@IBEREVAL-
groups:                                            2018 shared task (Fersini, Anzovino and
- Stereotype & Objectification: a widely held      Rosso, 2018). The aim of this challenge was
but fixed and oversimplified image or idea of      to detect misogynistic tweets and to create the
a woman, description of women’s physical           model which was able to classify misogynis-
and/or comparisons to narrow standards.            tic tweets for different groups depend-
- Dominance: to assert the superiority of men      ing on the type of misogyny. In particular, it
over women or to highlight gender inequal-         was demonstrated that, using models based
ity.                                               on Support Vector Machines (Pamungkas et
- Derailing: to justify abuse of women, reject-    al., 2018) and ensembles of models (Frenda et
ing male responsibility and an attempt to dis-     al., 2018), it is possible and quite successful
rupt the conversation in order to redirect         in cases where the aim is to make a classifica-
women’s conversations on something more            tion of tweets for different types and func-
comfortable for men.                               tions of misogyny. In our work we apply sev-
- Sexual Harassment & Threats of Violence:         eral of the same techniques - Support Vectors
to describe actions as sexual advances, re-        Machines and ensembles of models - to the
quests for sexual favours, harassment of a         task of misogyny tweets detection.
sexual nature, intent to physically assert             Some works which could help us to under-
power over women through threats of vio-           stand the way to hate speech messages classi-
lence.                                             fication were published in recent years. In
- Discredit: slurring of women with no other       (Schmidt and Wiegand, 2017) the authors
larger intention.                                  demonstrated methodologies of hate speech
    There were two datasets for the task, one      data processing. In another work (Waseem
of which contained tweets in the English lan-      and Hovy, 2016) there were presented useful
guage and another containing Italian tweets.       approaches to detect racial and sexist of-
Our team worked with English dataset only.         fenses. It should be noted that there was a
The English dataset was composed of 4,000          classification for 3 different groups (hate
tweets for training and 1,000 tweets for test-     speech, derogatory, profanity) with the under-
ing. The results were evaluated using the ac-      standing that hate speech is a kind of abusive
curacy performance for Task A and macro F-         language.
measure performance for Task B.
   In the research reported in (Nobata et al.,               we marked some combinations of
2016), it was shown (Bartlett et al., 2014) how               symbols which were used often in
to use NLP to analyse English-language mis-                   messages such as "!!! ", "??? " and
ogynistic tweets to find the frequencies of                   other emotional expressions, and re-
abusive words and the users who used this                     placed them with the term "emoji".
type of words more often. In other works
(Alexandrov et al., 2013; Kaurova et al.,
                                                   3.2       Models
2010) the authors focused on creating mod-
els which could allow the evaluation of the
tone of the text on a scale from very negative     The main idea of the modeling was to create
to very positive. They constructed a model for     an ensemble of different models which could
the groups of 3, 5 and 8 different categories      complement each other to achieve the best re-
and were able to achieve the results with a        sults. The final blended model assigns the
high accuracy using additional tools like          tweet to a specific class by majority voting.
GMDH Shell and Semantic Orientation Cal-           We used a number of simple models which
culation (So-CAL), which demonstrates              include:
the very high potential of using inductive            - Logistic regression model. Logistic re-
modelling for text-mining tasks. We are plan-      gression involves the construction of a discri-
ning to use techniques which were mentioned        minant model, which calculates the probabil-
above to improve the results of our model in       ity from a function of a weighted set of obser-
future.                                            vation features and assigns a class to each ob-
                                                   servation. The classifier based on logistic re-
3     System                                       gression applies an exponential function to a
                                                   linear combination of objects obtained from
In our approach we perform a number of se-
                                                   the input data (Wang et al., 2012; Wright,
quential actions including preprocessing,
                                                   1995).
model design, and finally embedding the con-
                                                      - Support Vector Machines classifier. As it
structed models in one ensemble.
                                                   was shown in (Joachims et al., 2002), this
                                                   method is very useful in work with texts. The
3.1    Preprocessing                               idea of this method is to translate the source
                                                   vectors into a higher dimension space and
In the first step, we prepared the data for the    search for such a separating hyperplane so
classification. To clean the data we removed       that the gap in this space is maximal. There
the string punctuation and converted words to      are two parallel hyperplanes on both sides of
lower case. For the vectorization we               the hyperplane that are constructed to separate
used the tf-idf (term frequency–inverse docu-      the classes, and one hyperplane that will max-
ment frequency) method which allows us to          imize the distance to two parallel ones is
reduce the weight of frequently occurring in       sought.
many documents words and to increase the              - Naive Bayes classifier. One of the ad-
weight of frequently occurring words in the        vantages of this method is the high speed of
documents. These were carried out for the          calculations (Zhang and Di Li, 2007), and an-
first run. For the subsequent two runs, we         other one is the number of the data which is
added some extra preprocessing steps:              needed to train the model - in this case it is not
      the replacement of all links with the       necessary to have a big training dataset to
         string "URL"                              achieve a high level of classification parame-
      the replacement of all references to        ter estimation.
         Twitter users (i.e, terms starting with      In the next step we combined the Naive
         the "@" symbol) with the term             Bayes approach and Logistic regression ap-
         "USER".                                   proach in one model, as presented in the work
(Genkin et al., 2007),which produced quite           within both the Misogynistic Category Clas-
good results.                                        sification and the Target Classification.
   In the final step we combined the models
we have mentioned, Logistic regression (LR),
Support Vector Machines (SVM), Naive                          Task           Classifier     F1-score
Bayes and Logistic Regression (NB+LR),                                           LR           0.78
into one ensemble. In this blended model the                Misogyny           NB+LR          0.72
probabilities of belonging to different classes          Identification
                                                                                SVM           0.71
from the simple models were summed and av-                                      Blend         0.78
eraged. We marked as a final choice the class
                                                                                 LR           0.60
which had the highest average probability.
                                                            Target             NB+LR          0.66
                                                        Classification
                                                                                SVM           0.76
4   Results                                                                     Blend         0.76
                                                                                 LR           0.50
   We chose three different runs for the eval-           Misogynistic          NB+LR          0.52
uation: one of them was implemented by us-                Behavior
                                                                                SVM           0.57
ing the simplest type of preprocessing (we                                      Blend         0.64
just deleted punctuation symbols and changed
all letters to the low case) and this variant sup-      Table 1.Performance on the validation set.
posed that we marked a tweet as misogynistic
one in case that two of three types of classifi-
cation marked this tweet as misogynous (Mi-             Also note that the results of our model in-
sogyny+Target or Misogyny+Misogynis-                 crease when the number of different classes
tic Behavior or Target+Misogynistic Behav-           decreases, thus an efficiency of the blended
ior).                                                model is reduced from the Misogyny Identifi-
   In the next step, we carried out a more in-       cation classification results to the Misogynis-
tricate preprocessing as described in Section        tic Behavior classification ones.
3.1 and applied the type of tweets labeling             The results of all 3 runs for the blended
such a way as we detected a tweet as miso-           model with the testing dataset are presented in
gynistic each time when at least one classi-         Table 2.
fier worked.
   The last run was implemented by using the                         Subtask A - English
most complicated preprocessing and the type                 Rank              Team         Accuracy
of tweets labeling such as at the first run.
   Table 1 shows the results of all three clas-                8          ITT.c.run2.tsv    0.638
sification types. As can be seen, the fourth                   9          ITT.c.run3.tsv    0.636
type of selection was the most successful. It
                                                              10          ITT.c.run1.tsv    0.636
could be concluded that the blended model
which contained more simple models (Lo-
gistic Regression, Naive Bayes + Logistic Re-               Table 2. Results of the classification.
gression and Support Vector Machines) al-
lows us to achieve the best results for all clas-       It can be concluded by the results on the
sification types: Misogyny Identification,           test data, the best run is the one with the most
Target Classification and Misogynistic Be-           complicated preprocessing and the type of la-
havior classification.                               belling, when we mark tweet as misogynistic
   It should be noted that we used the F-Meas-       every time when at least one of classifi-
ure for the results’ evaluation because this as-     ers worked.
sessment allows bringing together both recall
and precision and because of the imbalance
                                                   References
                                                   Alexandrov M., Danilova V., Koshulko A., Tejada, J.
                                                     2013. Models for opinion classication of blogs
5   Conclusion                                       taken from Peruvian Facebook. Proceedings of 4th
                                                     International Conference on Inductive Modeling
                                                     (ICIM-2013), pp. 241–246 .
A negative aspect of the increased usage of
platforms like Twitter is that incidents of ag-    Bartlett J., Norrie R., Patel S., Rumpel R., Wibber-
                                                     ley S. 2014. Misogyny on twitter, http://www.de-
gression and related activities like harassment      mos.co.uk/, 05.
and misogyny have increased significantly.
Nowadays it is an urgent problem to deal with      Fersini, E., Anzovino, M., Rosso. P. 2018. Overview of
                                                     the Task on Automatic Misogyny Identification
such type of text information and messages,          at IberEval. Proceedings of the Third Workshop on
and there are a lot of challenges that have a        Evaluation of Human Language Technologies for
connection with this task. In this article           Iberian Languages (IberEval 2018), co-located with
we have described our approach to misogyny           34th Conference of the Spanish Society for Natural
detection and classification of tweets. The          Language Processing (SEPLN 2018). CEUR Work-
                                                     shop Proceedings. CEUR-WS.org
method was presented for evaluation in the
framework of the Automatic Misogyny Iden-          Fersini E., Nozza D., Rosso P. 2018. Overview of the
                                                     Evalita 2018 Task on Automatic Misogyny Identifi-
tification (AMI) Shared Task at EVALITA
                                                     cation (AMI). Proceedings of the 6th evaluation
2018. We built an ensemble of models that in-        campaign of Natural Language Processing and
cludes Logistic regression, Naive Bayes and          Speech tools for Italian (EVALITA'18). Caselli,
Support Vector Machines approaches, which            Tommaso and Novielli, Nicole and Patti, Viviana
classified the data taking into account the          and Rosso, Paolo CEUR.org, Turin, Italy
probabilities of belonging to classes calcu-       Frenda S., Ghanem B. 2018. Montes-y-Gómez M. Ex-
lated by simpler models. It was shown that it         ploration of Misogyny in Spanish and English
is possible to achieve quite good results using       tweets. CEUR Workshop Proceedings. CEUR-
                                                      WS.org.
the final blended model and our model
showed the best results for the binary classifi-   Genkin A., Lewis D., Madigan D. 2007. Large-
cation of misogynistic tweets and non-miso-          scale bayesian logistic regression for text categori-
                                                     zation. Technometrics, 49(3):291–304.
gynistic ones.
    We observed preprocessing to be a very         Joachims, T. 2002. Learning to classify text using sup-
                                                      port vector machines: Methods, theory and algo-
important part of the data handling and it has
                                                      rithms. Kluwer Academic Publishers.
a high impact on the results of all models.
From our results it could be concluded that the    Kaurova O., Alexandrov M., Ponomareva N. 2010.
                                                     The Study of Sentiment Word Granularity for Opin-
highest accuracy has been produced with              ion Analysis (a Comparison with Maite Taboada
maximum additional work at the prepro-               Works). International Journal on Social Media.
cessing stage. It was important to pay atten-        MMM: Monitoring, Measurement, and Mining 1(1),
tion to the replacement of links and references      45–57.
with special symbols, because the run with         Nobata C., Tetreault J., Thomas A., Mehdad Y., Chang
this type of alteration demonstrated the best        Y. 2016. Abusive language detection in online user
results. Also, the best type of labelling miso-      content. Proceedings of the 25th International Con-
gynistic tweets was to mark the message as           ference on World Wide Web, pp. 145–153. Interna-
                                                     tional World Wide Web Conferences Steering Com-
misogyny if any one of the type of classifica-       mittee.
tion worked. At first we had an idea that it
                                                   Pamungkas E.W., Cignarella A.T., Basile V., Patti V.
could be more reliably if we mark tweet when         2018. 14-ExLab@UniTo for AMI at IberEval2018:
2 of 3 classifications mark it, but the real re-     Exploiting Lexical Knowledge for Detecting Misog-
sults disproved that hypothesis. We are cur-         yny in English and Spanish Tweets. CEUR Work-
rently investigating the addition of more fea-       shop Proceedings. CEUR-WS.org.
tures and models for the blended model to im-      Schmidt, A., Wiegand, M. 2017. A survey on hate
prove our results in the future.                     speech detection using natural language pro-
                                                     cessing. Proceedings of the Fifth International
  Workshop on Natural Language Processing for So-
  cial Media. Association for Computational Linguis-
  tics, Valencia, Spain, pp. 1–10.
Shushkevich E., Cardiff J. 2018. Classifying Misogyn-
  istic Tweets Using a Blended Model: The AMI
  Shared Task in IBEREVAL 2018. CEUR Workshop
  Proceedings. CEUR-WS.org.
Wang S., Manning C.D. 2012. Baselines and bigrams:
 simple, good sentiment and topic classification. Pro-
 ceedings of the 50th Annual Meeting of the Associ-
 ation for Computational Linguistics: Short Papers,
 ACL 2012, vol. 2, pp. 90–94.
Waseem, Z., Hovy, D. 2016. Hateful symbols or hate-
 ful people? predictive features for hate speech de-
 tection on Twitter. SRW@ HLT-NAACL, pp. 88–
 93.
Wright R. 1995. Logistic regression. L.C. Grimm &
  P.R. Yarnold (Eds.) Reading and understanding
  multivariate statistics. Washington, DC: American
  Psychological Association, 217-244
Zhang H. and Di Li. 2007. Naıve bayes text classifier.
  Granular Computing. GRC 2007. IEEE Interna-
  tional Conference on, pages 708–708. IEEE.