Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI) Elisabetta Fersini1 , Debora Nozza1 , Paolo Rosso2 1 DISCo, Universitá degli Studi di Milano-Bicocca 2 PRHLT Research Center, Universitat Politècnica de València {fersini, debora.nozza}@disco.unimib.it prosso@dsic.upv.es Abstract sistemi partecipanti sono stati distinti in base alla lingua, raccogliendo un totale English. Automatic Misogyny Identifi- di 6 team partecipanti per l’italiano e 10 cation (AMI) is a new shared task pro- team per l’inglese. Presentiamo di se- posed for the first time at the Evalita guito una sintesi dello shared task AMI, 2018 evaluation campaign. The AMI chal- i dataset, la metodologia di valutazione, lenge, based on both Italian and English i risultati ottenuti dai partecipanti e una tweets, is distinguished into two subtasks, discussione sulle metodologie adottate dai i.e. Subtask A on misogyny identifica- diversi team. Infine, vengono discusse tion and Subtask B about misogynistic be- conclusioni e delineati gli sviluppi futuri. haviour categorization and target classifi- cation. Regarding the Italian language, we have received a total of 13 runs for Sub- task A and 11 runs for Subtask B. Con- cerning the English language, we received 1 Introduction 26 submissions for Subtask A and 23 runs for Subtask B. The participating systems have been distinguished according to the During the last years, the phenomenon of hate language, counting 6 teams for Italian and against women increased exponentially especially 10 teams for English. We present here in online environment such as microblogs (He- an overview of the AMI shared task, the witt et al., 2016; Poland, 2016). According to datasets, the evaluation methodology, the the Pew Research Center Online Harassment re- results obtained by the participants and a port (2017) (Duggan, 2017), we can highlight that discussion of the methodology adopted by 41% of people were personally targeted, whose the teams. Finally, we draw some conclu- 18% were subjected to serious kinds of harass- sions and discuss future work. ment because of the gender (8%) and that women are more likely to be targeted than men (11% vs Italiano. Automatic Misogyny Identifica- 5%). Misogyny, defined as the hate or prejudice tion (AMI) è un nuovo shared task pro- against women, can be linguistically manifested in posto per la prima volta nella campagna numerous ways, ranging from less aggressive be- di valutazione Evalita 2018. La sfida AMI, haviours like social exclusion and discrimination basata su tweet italiani e inglesi, si dis- to more dangerous expressions related to threats tingue in due sottotask ossia Subtask A rel- of violence and sexual objectification (Anzovino ativo al riconoscimento della misoginia e et al., 2018). Given this relevant social problem, Subtask B relativo alla categorizzazione di the Automatic Misogyny Identification (AMI) task espressioni misogine e alla classificazione has been proposed first at IberEval 2018 (Span- del soggetto target. Per quanto riguarda la ish and English) (Fersini et al., 2018) and later at lingua italiana, sono stati ricevuti un to- Evalita 2018 (Italian and English) (Caselli et al., tale di 13 run per il Subtask A e 11 run 2018). The main goal of AMI is to distinguish per il Subtask B. Per quanto riguarda la misogynous contents from non-misogynous ones, lingua inglese, sono stati ricevuti 26 run to categorize misogynistic behaviours and finally per il Subtask A e 23 per Subtask B. I to classify the target of a tweet. Table 1: Examples of misogynous and non-misogynous tweets Misogynous Text Misogynous I’ve yet to come across a nice girl. They all end up being bit**es in the end. Non-misogynous @chiellini you are a bi*ch! 2 Task Description Concerning the target classification, the main goal is to classify each misogynous tweet as be- The AMI shared task is organized according to longing to one of the following two target cate- two main subtasks: gories: • Subtask A - Misogyny Identification: a sys- • Active (individual): the text includes offen- tem must discriminate misogynistic contents sive messages purposely sent to a specific tar- from the non-misogynistic ones. Examples get; of misogynous and non-misogynous tweets are reported in Table 1. • Passive (generic): it refers to messages posted to many potential receivers (e.g. • Subtask B - Misogynistic Behaviour and groups of women). Target Classification: a system must rec- ognize the targets that can be either specific Examples of targets of misogynous tweets are re- users or groups of women together with the ported in Table 3. identification of the type of misogyny against women. 3 Training and Testing Data In order to provide training and testing data both Regarding the misogynistic behaviour, a tweet for Italian and English, three approaches were em- must be classified as belonging to one of the fol- ployed to collect misogynous text on Twitter: lowing categories: • Streaming download using a set of manually • Stereotype & Objectification: a widely held defined representative keywords, e.g. bi**h, but fixed and oversimplified image or idea of w**re, c*nt for English and pu****a, tr**a, a woman; description of women’s physical f**a di legno for Italian; appeal and/or comparisons to narrow stan- dards. • Monitoring of potential victims’ accounts, e.g. gamergate victims and public feminist • Dominance: to assert the superiority of men women; over women to highlight gender inequality. • Downloading the history of identified misog- • Derailing: to justify woman abuse, reject- ynist, i.e. explicitly declared hate against ing male responsibility; an attempt to disrupt women on their Twitter profiles. the conversation in order to redirect women’s conversations on something more comfort- Among all the collected tweets we selected a sub- able for men. set of text querying the database with the co- presence of keywords, originating two corpora ini- • Sexual Harassment & Threats of Violence: to tially composed of 10000 tweets for each lan- describe actions as sexual advances, requests guage. In order to label both the Italian and En- for sexual favours, harassment of a sexual na- glish datasets, we involved a group of 6 experts ture; intent to physically assert power over exploiting the CrowdFlower1 platform for internal women through threats of violence. use. At the end of the labelling phase, we provided one corpus for Italian and one corpus for English • Discredit: slurring over women with no other to all the participants. The inter-rater annotator larger intention. agreement on the English dataset for the fields of “misogynous”, “misogyny category” and “target” Examples of Misogynistic Behaviours are re- 1 ported in Table 2. Now Figure Eight: https://figure-eight.com/ Table 2: Examples of tweets for each misogyny category Misogyny Category Text Discredit @AsiaArgento brainless fuc**ing bi**ch Stereotype Women play your role..#gobacktothekitchen Objectification Big te**ts and no brain..this is what I want as wife! Sexual Harassment I wanna touch your te**ts..you can’t immagine what I can do on your body. Threats of Violence Stupid bi**ch I’ll put you down on the floor and I’ll rape you! You should be scared! Dominance Women are inferior to men..so shut up please! Derailing @yesallwomen wearing a tiny skirt is ”asking for it”. Your teasing a (hard working, taxes paying) dog with a bone. That’s cruel. #YesAllMen Table 3: Examples of targets. Target Text Active @JulieB stupid crazy psychopathic woman..you should die... Passive Women: just an inferior breed!!! is 0.81, 0.45 and 0.49 respectively, while for the – 0 if the tweet is not misogynous. Italian dataset is 0.96, 0.68 and 0.76. Each corpus is distinguished in Training and Test datasets. Re- • target denotes the subject of the misogynous garding the training data, both the Italian and En- tweet; it takes value as: glish corpora are composed of 4000 tweets. Con- – active: denotes a specific target (individ- cerning the test data, we provided 1000 tweets for ual); each language. The training data has been pro- – passive: denotes potential receivers vided as tab-separated, according to the following (generic); fields: – 0 if the tweet is not misogynous. • id denotes a unique identifier of the tweet. Concerning the test data, only “id” and “text” have been provided to the participants. Exam- • text represents the tweet text. ples of all possible allowed combinations are re- • misogynous defines if the tweet is misogy- ported below. Additionally to the field “id”, we re- nous or not misogynous; it takes values as 1 port all the combinations of labels to be predicted, if the tweet is misogynous, 0 if the tweet is i.e. “misogynous”, “misogyny category” and “tar- not misogynous. get”: 000 • misogyny category denotes the type of 1 stereotype active misogynistic behaviour; it takes value as: 1 stereotype passive 1 dominance active – stereotype: denotes the category 1 dominance passive 1 derailing active “Stereotype & Objectification”; 1 derailing passive – dominance: denotes the category “Dom- 1 sexual harassment active inance”; 1 sexual harassment passive 1 discredit active – derailing: denotes the category “Derail- 1 discredit passive ing”; The label distribution related to the Training and – sexual harassment: denotes the cate- Test datasets is reported in Table 4. While the gory “Sexual Harassment & Threats of distribution of labels related to the field “misogy- Violence”; nous” is almost balanced (for both languages), the – discredit: denotes the category “Dis- classes related to the other fields are quite unbal- credit”; anced. Regarding the “misogyny category”, we can distinguish between the two considered lan- 5 Participants and Results guages. In particular, for the Italian language, A total of 6 teams for Italian and 10 teams for En- the most frequent label is related to the category glish from 10 different countries participated in at Stereotype & Objectification, while for English least one of the two subtasks of AMI. Each team the most predominant one is Discredit. Concern- had the chance to submit up to three runs for En- ing the “target”, the most predominant victims are glish and three runs for Italian. Runs could be con- specific users (active) with a strong imbalanced strained, where only the provided training data and distribution on the Italian corpus, while it is al- lexicons were admitted, and unconstrained, where most balanced for the English training dataset and additional data for training were allowed. Table 5 strongly imbalanced on the (active) targets for the shows an overview of the teams2 reporting their corresponding test dataset. affiliation, their country, the number of submis- sions for each language and the subtasks they ad- 4 Evaluation Measures and Baseline dressed. Considering the distribution of labels of the 5.1 Subtask A: Misogyny Identification dataset, we have chosen different evaluation met- Table 6 reports the results for the Misogyny Iden- rics. In particular, we distinguished as follows: tification task, which received 13 submissions for Italian and 26 runs for English submitted respec- Subtask A. Systems have been evaluated on tively from 6 and 10 teams. The highest Accuracy the field “misogynous” using the standard accu- has been achieved by bakarov at 0.844 for Italian racy measure, and ranked accordingly. and by hateminers at 0.704 for English, both in a constrained setting. Most of the systems have Subtask B. Each field to be predicted has shown an improvement with respect to the AMI- been evaluated independently on the other using BASELINE. While the bakarov team submitted a Macro F1-score. In particular, the Macro only one run based on TF-IDF coupled with Sin- F1-score for the “misogyny category” field has gular Value Decomposition and Boosting classi- been computed as average of F1-scores obtained fier, hateminers achieved the highest performance for each category (stereotype, dominance, de- with a run based on vector representation that con- railing, sexual harassment, discredit), estimating catenates sentence embedding, TF-IDF and aver- F1 (misogyny category). Analogously, the age word embeddings coupled with a Logistic Re- Macro F1-score for the “target” field has been gression model. computed as average of F1-scores obtained for each category (active, passive), F1 (target). 5.2 Subtask B: Misogynistic Behaviour and The final ranking of the systems participating Target Classification to Subtask B was based on the Average Macro Table 7 reports the results for the Misogynistic Be- F1-score (F1 ), computed as follows: haviour and Target Classification task, which re- ceived 11 submissions by 5 teams for Italian and F1 = F1 (misogyny category)+F 2 1 (target) (1) 23 submissions by 9 teams for English. The high- est Average Macro F1-score has been achieved by In order to compare the submitted runs with a bakarov at 0.501 for Italian (even if the amended baseline model, we provided a benchmark (AMI- run of CrotoneMilano achieved the highest effec- BASELINE) based on Support Vector Machine tive performance) and by himani at 0.406 for En- trained on a unigram representation of tweets. In glish, both in a constrained setting. On the con- particular, we created one training set for each trary of the previous task, most of the systems have field to be predicted, i.e. “misogynous”, “misog- shown lower performance compared to the AMI- yny category” and “target”, where each tweet has BASELINE. It can be easily noted by looking at been represented as a bag-of-words (composed of the Average Macro F1-score of all the approaches, 1000 terms) coupled with the corresponding label. that the problem of recognizing the misogyny cat- Once the representations have been obtained, Sup- egory and the target is more difficult than the port Vector Machines with linear kernel have been 2 The teams himani and resham described their systems in trained, and provided as AMI-BASELINE. the same report (Ahluwalia et al., 2018). Table 4: Distribution of labels for “misogynous”, “misogyny category” and “target” on the Training and Test datasets. Percentages for “misogyny category” and “target” are computed with respect to the number of misogynous tweets. Training Testing Italian English Italian English Misogynous 1828 (46%) 1785 (45%) 512 (51%) 460 (46%) Non-misogynous 2172 (54%) 2215 (55%) 488 (49%) 540 (54%) Discredit 634 (35%) 1014 (57%) 104 (20%) 141 (31%) Sexual Harassment & Threats of Violence 431 (24%) 352 (20%) 170 (33%) 44 (10%) Derailing 24 (1%) 92 (5%) 2 (1%) 11 (2%) Stereotype & Objectification 668 (37%) 179 (10%) 175 (34%) 140 (30%) Dominance 71 (3%) 148 (8%) 61 (12%) 124 (27%) Active 1721 (94%) 1058 (59%) 446 (87%) 401 (87%) Passive 107 (6%) 727 (41%) 66 (13%) 59 (13%) Table 5: Team overview Team Name Affiliation Country Runs Subtask University of Turin IT 14-exlab (Pamungkas et al., 2018) 3 (EN), 3 (IT) A, B Universitat Politècnica de València ES bakarov (Bakarov, 2018) Huawei Technologies RUS 3 (EN), 3 (IT) A, B Symanto Research DE CrotoneMilano (Basile and Rubagotti, 2018) 1 (EN), 1 (IT) A, B Independent Researcher IT hateminers (Saha et al., 2018) Indian Institute of Technology IND 3 (EN), 0 (IT) A, B himani (Ahluwalia et al., 2018) University of Washington Tacoma USA 3 (EN), 0 (IT) A, B Institute of Technology Tallaght IRL ITT (Shushkevich and Cardiff, 2018) 3 (EN), 0 (IT) A, B Yandex RUS RCLN (Buscaldi, 2018) Université Paris 13 FR 1 (EN), 1 (IT) A, B resham (Ahluwalia et al., 2018) University of Washington USA 3 (EN), 0 (IT) A, B University of Turin IT SB (Frenda et al., 2018b) Universitat Politècnica de València ES 3 (EN), 3 (IT) A, B INAOE MEX INESC TEC PT StopPropagHate (Fortuna et al., 2018) Eurecat 3 (EN), 2 (IT) A ES Porto University misogyny identification task. the machine learning model that has been used as This is due to the fact that there can be a high classification model. overlapping between textual expressions of differ- Textual Feature Representation. The systems ent misogyny categories, therefore it is highly sub- submitted by the challenge participants’ consider jective for an annotator (and consequently for a various techniques for representing the tweet con- system) to select a category rather than another tents. Some teams have concentrated the effort on one. Regarding the target classification, systems considering a single type of representation, i.e. the can be easily misled by the presence of mentions team ITT adopted the traditional TF-IDF repre- that are not the target of the misogynous content. sentation, while bakarov and RCLN proposed sys- While for the bakarov team the system for Sub- tems considering only weighted n-grams at char- task B is the same one of Subtask A, himani acter level for better dealing with misspellings and achieved the highest performance on the English capturing few stylistic aspects. language with a run based on a Bag of N-Gram Additionally to the traditional textual fea- representation coupled with an Ensemble of 5 ture representation techniques (i.e. bag of models for classifying the Misogynistic Behaviour words/characters, n-grams of words/characters and 2 models for Target Classification. eventually weighted with TF-IDF) several teams 6 Discussion proposed specific lexical features for improving the input space and consequently the classification The submitted systems can be compared by taking performances. The team of CrotoneMilano exper- into consideration the kind of input features that imented feature abstraction following the bleach- they have considered for representing tweets and ing approach proposed by Goot et al. (Goot et al., Table 6: Results of Subtask A. Constrained runs are marked as .c, while the unconstrained ones with .u. After the deadline one team reported a format error. The resubmitted amended runs are marked with **. ITALIAN ENGLISH Rank Team Accuracy Rank Team Accuracy 1 bakarov.c.run2 0.844 1 hateminers.c.run1 0.704 ** CrotoneMilano.c.run1 0.843 2 hateminers.c.run3 0.681 2 bakarov.c.run1 0.842 3 hateminers.c.run2 0.673 3 14-exlab.c.run3 0.839 4 resham.c.run3 0.651 4 bakarov.c.run3 0.836 5 bakarov.c.run3 0.649 5 14-exlab.c.run2 0.835 6 resham.c.run1 0.648 6 StopPropagHate.c.run1 0.835 7 resham.c.run2 0.647 7 AMI-BASELINE 0.830 8 ITT.c.run2 0.638 8 StopPropagHate.u.run2 0.829 9 ITT.c.run1 0.636 9 SB.c.run1 0.824 10 ITT.c.run3 0.636 10 RCLN.c.run1 0.824 11 himani.c.run2 0.628 11 SB.c.run3 0.823 12 bakarov.c.run2 0.628 12 SB.c.run2 0.822 13 14-exlab.c.run3 0.621 13 14-exlab.c.run1 0.765 14 himani.c.run1 0.619 ** CrotoneMilano.c.run1 0.617 15 himani.c.run3 0.614 16 14-exlab.c.run1 0.614 17 SB.c.run2 0.613 18 AMI-BASELINE 0.605 19 bakarov.c.run1 0.605 20 StopPropagHate.c.run1 0.593 21 SB.c.run1 0.592 22 StopPropagHate.u.run3 0.591 23 StopPropagHate.u.run2 0.590 24 RCLN.c.run1 0.586 25 SB.c.run3 0.584 26 14-exlab.c.run2 0.500 2018) for modelling gender through the language. ploited by 14-exlab by using both linear and Specific lexicons for dealing with hate speech lan- RBF kernel, by SB investigating only a radial guage have been included as features in the sys- basis function kernel, and by CrotoneMilano tems of SB, resham and 14-exlab. In particular, re- by adopting again a simple linear kernel; sham and 14-exlab made also use of environment- • Logistic Regression has been used by specific features, such as links, hashtags and emo- bakarov and hateminers; jis, and task-specific features, such as swear word, sexist slurs and women-related words. • Ensemble Models have been adopted by three Differently from these approaches, Stop- teams according to different settings, i.e. ITT PropagHate and hateminers teams proposed sys- and himani used a Simple Voting of different tems that consider the popular Embeddings tech- classifiers, resham induced a Simple Voting niques both at word and sentence level. over different input features and RCLN used an Ensemble based on Random Forest; Machine Learning Models. Concerning the machine learning models, we can distinguish • A Deep Learning classifier has been adopted between approaches that work with traditional by only one team, i.e StopPropagHate that Support Vector Machines and Logistic Regres- trained a simple dense neural network. sion, Ensemble Models and finally Deep Learn- External Resources Several participants ex- ing methods. Following, we report the models ploited external resources for providing task- adopted by the systems that participated in the specific lexical features. AMI shared task, according to the type of the ma- The lexicons for addressing AMI for Italian chine learning model that has been adopted: have been mostly obtained from lists available on- • Support Vector Machines have been ex- line. The team SB used an available specific Italian Table 7: Results of Subtask B. Constrained runs are marked as .c, while the unconstrained ones with .u. After the deadline one team reported a format error. The resubmitted amended runs are marked with **. ITALIAN ENGLISH Average Macro Average Macro Rank Team Rank Team F1-score F1-score ** CrotoneMilano.c.run1 0.501 1 himani.c.run3 0.406 1 bakarov.c.run1 0.493 2 himani.c.run2 0.377 2 AMI-BASELINE 0.487 3 AMI-BASELINE 0.370 3 14-exlab.c.run3 0.485 ** CrotoneMilano.c.run1 0.369 4 14-exlab.c.run2 0.482 4 hateminers.c.run3 0.369 5 bakarov.c.run3 0.478 5 hateminers.c.run1 0.348 6 bakarov.c.run2 0.463 6 SB.c.run2 0.344 7 SB.c.run3 0.449 7 himani.c.run1 0.342 8 SB.c.run1 0.448 8 SB.c.run1 0.335 9 RCLN.c.run1 0.448 9 hateminers.c.run2 0.329 10 SB.c.run2 0.446 10 SB.c.run3 0.328 11 14-exlab.c.run1 0.292 11 resham.c.run2 0.322 12 resham.c.run1 0.316 13 bakarov.c.run1 0.309 14 resham.c.run3 0.283 15 RCLN.c.run1 0.280 16 ITT.c.run2 0.276 17 bakarov.c.run2 0.275 18 14-exlab.c.run1 0.260 19 bakarov.c.run3 0.254 20 14-exlab.c.run3 0.239 21 ITT.c.run1 0.238 22 ITT.c.run3 0.237 23 14-exlab.c.run2 0.232 lexicon called “Le parole per ferire” built by Tullio been considered by SB and hateminers teams, De Mauro3 . Starting from this lexicon provided specifically GloVe (Pennington et al., 2014) for by De Mauro, the HurtLex multilingual lexicon the English task and Word Embeddings built on has been created (Bassignana et al., 2018). Be- the TWITA corpus for the Italian one (Basile and yond HurtLex, the team 14-exlab gathered a swear Novielli, 2014). word list from several sources4 including a trans- lated version of the noswearing dictionary5 and a 7 Conclusions and Future Work list of swear words from (Capuano, 2007). Regarding the English language, both resham We presented here a new shared task about Auto- and 14-exlab used the list of swear words from matic Misogyny Identification on Twitter for Ital- noswearing dictionary and the sexist slur list pro- ian and English. By analysing the runs submitted vided by (Fasoli et al., 2015). The team re- by the participants we can conclude that the prob- sham further investigated the sentiment polar- lem of misogyny identification has been satisfac- ity retrieved from SentiWordNet (Baccianella et torily addressed by all the teams, while the misog- al., 2010). Differently, the team SB exploited a ynistic behaviour and target classification still re- manually modeled lexicon for the misogyny de- mains a challenging problem. Concerning the fu- tection task proposed in (Frenda et al., 2018a). ture work, several issues should be considered to The HurtLex lexicon has been used by the team improve the quality of the collected data, espe- 14-exlab also for the English task. cially for capturing those less frequent misogynis- Finally, pre-trained Word Embeddings have tic behaviours such as Dominance and Derailing. 3 The problem of hate speech against women will https://www.internazionale.it/ be further addressed in the HatEval shared task at opinione/tullio-de-mauro/2016/09/27/ razzismo-parole-ferire SemEval in English and Spanish tweets6 . 4 https://www.parolacce.org/2016/12/ 20/dati-frequenza-turpiloquio/ and https: 6 SemEval 2019 Task 5: HatEval: Multilingual De- //it.wikipedia.org/wiki/Turpiloquio_ tection of Hate Speech Against Immigrants and Women nella_lingua_italiana in Twitter https://competitions.codalab.org/ 5 https://www.noswearing.com/dictionary competitions/19935 Acknowledgements R.G. Capuano. 2007. Turpia: sociologia del turpil- oquio e della bestemmia. Riscontri (Milan, Italy). The work of the third author was partially funded Costa & Nolan. by the SomEMBED TIN2015-71147-C2-1-P re- search project (MINECO/FEDER). We thank Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso. 2018. EVALITA 2018: Overview of Maria Anzovino for her initial help in collect- the 6th Evaluation Campaign of Natural Language ing the tweets subsequently used for the labelling Processing and Speech Tools for Italian. In Tom- phase and the final creation of the Italian and En- maso Caselli, Nicole Novielli, Viviana Patti, and glish corpora used for the AMI shared task. Paolo Rosso, editors, Proceedings of Sixth Evalua- tion Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018), Turin, Italy. CEUR.org. References Maeve Duggan. 2017. Online Harassment. Resham Ahluwalia, Himani Soni, Edward Callow, An- http://www.pewinternet.org/2017/ derson Nascimento, and Martine De Cock. 2018. 07/11/online-harassment-2017/. Last Detecting Hate Speech Against Women in English accessed 2018-10-28. Tweets. In Proceedings of Sixth Evaluation Cam- paign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018), Fabio Fasoli, Andrea Carnaghi, and Maria Paola Pal- Turin, Italy. CEUR.org. adino. 2015. Social acceptability of sexist deroga- tory and sexist objectifying slurs across contexts. Maria Anzovino, Elisabetta Fersini, and Paolo Rosso. Language Sciences, 52:98–107. 2018. Automatic Identification and Classification of Misogynistic Language on Twitter. In International Elisabetta Fersini, M Anzovino, and P Rosso. 2018. Conference on Applications of Natural Language to Overview of the task on automatic misogyny iden- Information Systems, pages 57–64. Springer. tification at ibereval. In Proceedings of the Third Workshop on Evaluation of Human Language Tech- Stefano Baccianella, Andrea Esuli, and Fabrizio Sebas- nologies for Iberian Languages (IberEval 2018). tiani. 2010. Sentiwordnet 3.0: An enhanced lexical CEUR Workshop Proceedings. CEUR-WS. org, resource for sentiment analysis and opinion mining. Seville, Spain. In Proceedings of the International Conference on Language Resources and Evaluation. Paula Fortuna, Ilaria Bonavita, and Sérgio Nunes. 2018. INESC TEC, Eurecat and Porto University. Amir Bakarov. 2018. Vector Space Models for Au- tomatic Misogyny Identification. In Proceedings Simona Frenda, Ghanem Bilal, et al. 2018a. Ex- of Sixth Evaluation Campaign of Natural Language ploration of Misogyny in Spanish and English Processing and Speech Tools for Italian. Final Work- tweets. In Third Workshop on Evaluation of Hu- shop (EVALITA 2018), Turin, Italy. CEUR.org. man Language Technologies for Iberian Languages (IberEval 2018), volume 2150, pages 260–267. Ceur Pierpaolo Basile and Nicole Novielli. 2014. UNIBA Workshop Proceedings. at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, Simona Frenda, Bilal Ghanem, Estefanı́a Guzmán- lexicon and semantic features. In Proceedings of Falcón, Manuel Montes-y-Gómez, and Luis Vil- Fourth Evaluation Campaign of Natural Language laseñor-Pineda. 2018b. Automatic Lexicons Ex- Processing and Speech Tools for Italian. Final Work- pansion for Multilingual Misogyny Detection. In shop (EVALITA 2014). CEUR.org. Proceedings of Sixth Evaluation Campaign of Natu- ral Language Processing and Speech Tools for Ital- Angelo Basile and Chiara Rubagotti. 2018. Automatic ian. Final Workshop (EVALITA 2018), Turin, Italy. Identification of Misogyny in English and Italian CEUR.org. Tweets at EVALITA 2018 with a Multilingual Hate Lexicon. In Proceedings of Sixth Evaluation Cam- Rob Goot, Nikola Ljubešić, Ian Matroos, Malvina Nis- paign of Natural Language Processing and Speech sim, and Barbara Plank. 2018. Bleaching Text: Ab- Tools for Italian. Final Workshop (EVALITA 2018), stract Features for Cross-lingual Gender Prediction. Turin, Italy. CEUR.org. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume Davide Buscaldi. 2018. Tweetaneuse AMI 2: Short Papers), volume 2, pages 383–389. EVALITA2018: Character-based Models for the Automatic Misogyny Identification Task. In Pro- Sarah Hewitt, Thanassis Tiropanis, and Christian ceedings of Sixth Evaluation Campaign of Natu- Bokhove. 2016. The Problem of identifying Misog- ral Language Processing and Speech Tools for Ital- ynist Language on Twitter (and other online social ian. Final Workshop (EVALITA 2018), Turin, Italy. spaces). In Proceedings of the 8th ACM Conference CEUR.org. on Web Science, pages 333–335. ACM. Endang Wahyu Pamungkas, Alessandra Teresa Cignarella, Valerio Basile, and Viviana Patti. 2018. Automatic Identification of Misogyny in English and Italian Tweets at EVALITA 2018 with a Multilingual Hate Lexicon. In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018), Turin, Italy. CEUR.org. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 confer- ence on Empirical Methods in Natural Language Processing, pages 1532–1543. Bailey Poland. 2016. Haters: Harassment, Abuse, and Violence Online. Potomac Books, Incorporated. Punyajoy Saha, Binny Mathew, Pawan Goyal, and Ani- mesh Mukherjee. 2018. Indian Institute of Engi- neering Science and Technology (Shibpur), Indian Institute of Technology (Kharagpur). Elena Shushkevich and John Cardiff. 2018. Misog- yny detection and classification in English tweets. In Proceedings of Sixth Evaluation Campaign of Natu- ral Language Processing and Speech Tools for Ital- ian. Final Workshop (EVALITA 2018), Turin, Italy. CEUR.org.