=Paper=
{{Paper
|id=Vol-2263/paper030
|storemode=property
|title=Misogyny Detection and Classification in English Tweets: The Experience of the ITT Team
|pdfUrl=https://ceur-ws.org/Vol-2263/paper030.pdf
|volume=Vol-2263
|authors=Elena Shushkevich,John Cardiff
|dblpUrl=https://dblp.org/rec/conf/evalita/ShushkevichC18
}}
==Misogyny Detection and Classification in English Tweets: The Experience of the ITT Team==
Misogyny Detection and Classification in English Tweets: The Experience of the ITT Team Elena Shushkevich John Cardiff Social Media Research Group Social Media Research Group Institute of Technology Tallaght Institute of Technology Tallaght Dublin, Ireland Dublin, Ireland e.shushkevich@yandex.ru john.cardiff@it-tallaght.ie Il nostro metodo è stato presentato at- Abstract traverso la nostra partecipazione allo shared task AMI presso la campagna di English. The problem of online misog- valutazione EVALITA 2018. yny and women-based offending has be- come increasingly widespread, and the automatic detection of such messages is 1 Introduction an urgent priority. In this paper, we pre- sent an approach based on an ensemble It is hard to miss the fact that an intensive of Logistic Regression, Support Vector growth of social networking has led not only Machines, and Naïve Bayes models for to the rise of personal communication oppor- the detection of misogyny in texts ex- tunities, but also to an increase in aggres- tracted from the Twitter platform. Our sion on social media. Hate speech can be method has been presented in the frame- aimed at sexual orientation, race, religion as work of the participation in the Auto- gender as a whole. In particular, when the tar- matic Misogyny Identification (AMI) get of hate speech is women, we could say Shared Task in the EVALITA 2018 that this is misogyny. Nowadays, more and evaluation campaign. more attention is paid to this problem, and one of the directions for the hate speech Italiano. Il problema della misoginia recognition is the women-oriented aggression online e dell'odio diretto verso le donne detection in social networks. si sta diffondendo sempre più, e così il It is important to work with hate speech riconoscimento automatico di tali mes- and misogyny detection now, because over saggi è una priorità importante. the course of time the data from social net- In questo articolo, presentiamo un ap- works will grow and this problem will be- proccio basato sui classificatori Lo- come more and more serious. It is necessary gistic Regression, SVM e Naive Bayes to create a range of systems which allow us to per il riconoscimento automatico della detect and control the number of hate speech misoginia in testi estratti da Twitter. messages, and we need to understand how to classify this type of information and how we could reduce the number of it. So, it is a big This paper presents our approach to solve challenge to find the way of misogyny data the above problems. The detection and processing. main thrust of our approach is to build a This paper describes our participation in model that allows us to assess the classifica- the Automatic Misogyny Identification tion of any tweet to its assigned group. (AMI) Shared Task, in EVALITA 2018 (Fer- The paper is organized as follows. sini, Nozza and Rosso, 2018). The aim of the Some relevant related works in the area are task is to identify misogynistic text in tweets. described in Section 2. Section 3 presents the The task contained two different subtasks: way we conducted data preprocessing and the Subtask A - Misogyny Identification: the approach we chose for building the desired main goal of the task was to separate misogy- model. In Section 4 the results are described nous tweets from non-misogynous. and analyzed. In Section 5 we summarize our Subtask B - Misogynistic Behavior and Tar- work. get Classification: the idea of the target clas- sification was to define misogynous tweet 2 Related work which offends a specific person (Active) and There are a number of approaches in the area tweets which insult a group of people (Pas- of text processing by machine learning meth- sive). ods which allow us to deal with misogyny and Misogynistic behavior task was intended to harassment in texts. Some of these were pre- divide misogynous tweets into different sented in the AMI@IBEREVAL- groups: 2018 shared task (Fersini, Anzovino and - Stereotype & Objectification: a widely held Rosso, 2018). The aim of this challenge was but fixed and oversimplified image or idea of to detect misogynistic tweets and to create the a woman, description of women’s physical model which was able to classify misogynis- and/or comparisons to narrow standards. tic tweets for different groups depend- - Dominance: to assert the superiority of men ing on the type of misogyny. In particular, it over women or to highlight gender inequal- was demonstrated that, using models based ity. on Support Vector Machines (Pamungkas et - Derailing: to justify abuse of women, reject- al., 2018) and ensembles of models (Frenda et ing male responsibility and an attempt to dis- al., 2018), it is possible and quite successful rupt the conversation in order to redirect in cases where the aim is to make a classifica- women’s conversations on something more tion of tweets for different types and func- comfortable for men. tions of misogyny. In our work we apply sev- - Sexual Harassment & Threats of Violence: eral of the same techniques - Support Vectors to describe actions as sexual advances, re- Machines and ensembles of models - to the quests for sexual favours, harassment of a task of misogyny tweets detection. sexual nature, intent to physically assert Some works which could help us to under- power over women through threats of vio- stand the way to hate speech messages classi- lence. fication were published in recent years. In - Discredit: slurring of women with no other (Schmidt and Wiegand, 2017) the authors larger intention. demonstrated methodologies of hate speech There were two datasets for the task, one data processing. In another work (Waseem of which contained tweets in the English lan- and Hovy, 2016) there were presented useful guage and another containing Italian tweets. approaches to detect racial and sexist of- Our team worked with English dataset only. fenses. It should be noted that there was a The English dataset was composed of 4,000 classification for 3 different groups (hate tweets for training and 1,000 tweets for test- speech, derogatory, profanity) with the under- ing. The results were evaluated using the ac- standing that hate speech is a kind of abusive curacy performance for Task A and macro F- language. measure performance for Task B. In the research reported in (Nobata et al., we marked some combinations of 2016), it was shown (Bartlett et al., 2014) how symbols which were used often in to use NLP to analyse English-language mis- messages such as "!!! ", "??? " and ogynistic tweets to find the frequencies of other emotional expressions, and re- abusive words and the users who used this placed them with the term "emoji". type of words more often. In other works (Alexandrov et al., 2013; Kaurova et al., 3.2 Models 2010) the authors focused on creating mod- els which could allow the evaluation of the tone of the text on a scale from very negative The main idea of the modeling was to create to very positive. They constructed a model for an ensemble of different models which could the groups of 3, 5 and 8 different categories complement each other to achieve the best re- and were able to achieve the results with a sults. The final blended model assigns the high accuracy using additional tools like tweet to a specific class by majority voting. GMDH Shell and Semantic Orientation Cal- We used a number of simple models which culation (So-CAL), which demonstrates include: the very high potential of using inductive - Logistic regression model. Logistic re- modelling for text-mining tasks. We are plan- gression involves the construction of a discri- ning to use techniques which were mentioned minant model, which calculates the probabil- above to improve the results of our model in ity from a function of a weighted set of obser- future. vation features and assigns a class to each ob- servation. The classifier based on logistic re- 3 System gression applies an exponential function to a linear combination of objects obtained from In our approach we perform a number of se- the input data (Wang et al., 2012; Wright, quential actions including preprocessing, 1995). model design, and finally embedding the con- - Support Vector Machines classifier. As it structed models in one ensemble. was shown in (Joachims et al., 2002), this method is very useful in work with texts. The 3.1 Preprocessing idea of this method is to translate the source vectors into a higher dimension space and In the first step, we prepared the data for the search for such a separating hyperplane so classification. To clean the data we removed that the gap in this space is maximal. There the string punctuation and converted words to are two parallel hyperplanes on both sides of lower case. For the vectorization we the hyperplane that are constructed to separate used the tf-idf (term frequency–inverse docu- the classes, and one hyperplane that will max- ment frequency) method which allows us to imize the distance to two parallel ones is reduce the weight of frequently occurring in sought. many documents words and to increase the - Naive Bayes classifier. One of the ad- weight of frequently occurring words in the vantages of this method is the high speed of documents. These were carried out for the calculations (Zhang and Di Li, 2007), and an- first run. For the subsequent two runs, we other one is the number of the data which is added some extra preprocessing steps: needed to train the model - in this case it is not the replacement of all links with the necessary to have a big training dataset to string "URL" achieve a high level of classification parame- the replacement of all references to ter estimation. Twitter users (i.e, terms starting with In the next step we combined the Naive the "@" symbol) with the term Bayes approach and Logistic regression ap- "USER". proach in one model, as presented in the work (Genkin et al., 2007),which produced quite within both the Misogynistic Category Clas- good results. sification and the Target Classification. In the final step we combined the models we have mentioned, Logistic regression (LR), Support Vector Machines (SVM), Naive Task Classifier F1-score Bayes and Logistic Regression (NB+LR), LR 0.78 into one ensemble. In this blended model the Misogyny NB+LR 0.72 probabilities of belonging to different classes Identification SVM 0.71 from the simple models were summed and av- Blend 0.78 eraged. We marked as a final choice the class LR 0.60 which had the highest average probability. Target NB+LR 0.66 Classification SVM 0.76 4 Results Blend 0.76 LR 0.50 We chose three different runs for the eval- Misogynistic NB+LR 0.52 uation: one of them was implemented by us- Behavior SVM 0.57 ing the simplest type of preprocessing (we Blend 0.64 just deleted punctuation symbols and changed all letters to the low case) and this variant sup- Table 1.Performance on the validation set. posed that we marked a tweet as misogynistic one in case that two of three types of classifi- cation marked this tweet as misogynous (Mi- Also note that the results of our model in- sogyny+Target or Misogyny+Misogynis- crease when the number of different classes tic Behavior or Target+Misogynistic Behav- decreases, thus an efficiency of the blended ior). model is reduced from the Misogyny Identifi- In the next step, we carried out a more in- cation classification results to the Misogynis- tricate preprocessing as described in Section tic Behavior classification ones. 3.1 and applied the type of tweets labeling The results of all 3 runs for the blended such a way as we detected a tweet as miso- model with the testing dataset are presented in gynistic each time when at least one classi- Table 2. fier worked. The last run was implemented by using the Subtask A - English most complicated preprocessing and the type Rank Team Accuracy of tweets labeling such as at the first run. Table 1 shows the results of all three clas- 8 ITT.c.run2.tsv 0.638 sification types. As can be seen, the fourth 9 ITT.c.run3.tsv 0.636 type of selection was the most successful. It 10 ITT.c.run1.tsv 0.636 could be concluded that the blended model which contained more simple models (Lo- gistic Regression, Naive Bayes + Logistic Re- Table 2. Results of the classification. gression and Support Vector Machines) al- lows us to achieve the best results for all clas- It can be concluded by the results on the sification types: Misogyny Identification, test data, the best run is the one with the most Target Classification and Misogynistic Be- complicated preprocessing and the type of la- havior classification. belling, when we mark tweet as misogynistic It should be noted that we used the F-Meas- every time when at least one of classifi- ure for the results’ evaluation because this as- ers worked. sessment allows bringing together both recall and precision and because of the imbalance References Alexandrov M., Danilova V., Koshulko A., Tejada, J. 2013. Models for opinion classication of blogs 5 Conclusion taken from Peruvian Facebook. Proceedings of 4th International Conference on Inductive Modeling (ICIM-2013), pp. 241–246 . A negative aspect of the increased usage of platforms like Twitter is that incidents of ag- Bartlett J., Norrie R., Patel S., Rumpel R., Wibber- ley S. 2014. Misogyny on twitter, http://www.de- gression and related activities like harassment mos.co.uk/, 05. and misogyny have increased significantly. Nowadays it is an urgent problem to deal with Fersini, E., Anzovino, M., Rosso. P. 2018. Overview of the Task on Automatic Misogyny Identification such type of text information and messages, at IberEval. Proceedings of the Third Workshop on and there are a lot of challenges that have a Evaluation of Human Language Technologies for connection with this task. In this article Iberian Languages (IberEval 2018), co-located with we have described our approach to misogyny 34th Conference of the Spanish Society for Natural detection and classification of tweets. The Language Processing (SEPLN 2018). CEUR Work- shop Proceedings. CEUR-WS.org method was presented for evaluation in the framework of the Automatic Misogyny Iden- Fersini E., Nozza D., Rosso P. 2018. Overview of the Evalita 2018 Task on Automatic Misogyny Identifi- tification (AMI) Shared Task at EVALITA cation (AMI). Proceedings of the 6th evaluation 2018. We built an ensemble of models that in- campaign of Natural Language Processing and cludes Logistic regression, Naive Bayes and Speech tools for Italian (EVALITA'18). Caselli, Support Vector Machines approaches, which Tommaso and Novielli, Nicole and Patti, Viviana classified the data taking into account the and Rosso, Paolo CEUR.org, Turin, Italy probabilities of belonging to classes calcu- Frenda S., Ghanem B. 2018. Montes-y-Gómez M. Ex- lated by simpler models. It was shown that it ploration of Misogyny in Spanish and English is possible to achieve quite good results using tweets. CEUR Workshop Proceedings. CEUR- WS.org. the final blended model and our model showed the best results for the binary classifi- Genkin A., Lewis D., Madigan D. 2007. Large- cation of misogynistic tweets and non-miso- scale bayesian logistic regression for text categori- zation. Technometrics, 49(3):291–304. gynistic ones. We observed preprocessing to be a very Joachims, T. 2002. Learning to classify text using sup- port vector machines: Methods, theory and algo- important part of the data handling and it has rithms. Kluwer Academic Publishers. a high impact on the results of all models. From our results it could be concluded that the Kaurova O., Alexandrov M., Ponomareva N. 2010. The Study of Sentiment Word Granularity for Opin- highest accuracy has been produced with ion Analysis (a Comparison with Maite Taboada maximum additional work at the prepro- Works). International Journal on Social Media. cessing stage. It was important to pay atten- MMM: Monitoring, Measurement, and Mining 1(1), tion to the replacement of links and references 45–57. with special symbols, because the run with Nobata C., Tetreault J., Thomas A., Mehdad Y., Chang this type of alteration demonstrated the best Y. 2016. Abusive language detection in online user results. Also, the best type of labelling miso- content. Proceedings of the 25th International Con- gynistic tweets was to mark the message as ference on World Wide Web, pp. 145–153. Interna- tional World Wide Web Conferences Steering Com- misogyny if any one of the type of classifica- mittee. tion worked. At first we had an idea that it Pamungkas E.W., Cignarella A.T., Basile V., Patti V. could be more reliably if we mark tweet when 2018. 14-ExLab@UniTo for AMI at IberEval2018: 2 of 3 classifications mark it, but the real re- Exploiting Lexical Knowledge for Detecting Misog- sults disproved that hypothesis. We are cur- yny in English and Spanish Tweets. CEUR Work- rently investigating the addition of more fea- shop Proceedings. CEUR-WS.org. tures and models for the blended model to im- Schmidt, A., Wiegand, M. 2017. A survey on hate prove our results in the future. speech detection using natural language pro- cessing. Proceedings of the Fifth International Workshop on Natural Language Processing for So- cial Media. Association for Computational Linguis- tics, Valencia, Spain, pp. 1–10. Shushkevich E., Cardiff J. 2018. Classifying Misogyn- istic Tweets Using a Blended Model: The AMI Shared Task in IBEREVAL 2018. CEUR Workshop Proceedings. CEUR-WS.org. Wang S., Manning C.D. 2012. Baselines and bigrams: simple, good sentiment and topic classification. Pro- ceedings of the 50th Annual Meeting of the Associ- ation for Computational Linguistics: Short Papers, ACL 2012, vol. 2, pp. 90–94. Waseem, Z., Hovy, D. 2016. Hateful symbols or hate- ful people? predictive features for hate speech de- tection on Twitter. SRW@ HLT-NAACL, pp. 88– 93. Wright R. 1995. Logistic regression. L.C. Grimm & P.R. Yarnold (Eds.) Reading and understanding multivariate statistics. Washington, DC: American Psychological Association, 217-244 Zhang H. and Di Li. 2007. Naıve bayes text classifier. Granular Computing. GRC 2007. IEEE Interna- tional Conference on, pages 708–708. IEEE.