Automatic Misogyny Identification Using Neural Networks

Automatic Misogyny Identification Using Neural Networks IGoenaga Ixa Taldea EHU IXA Group University of the Basque Country (EHU/UPV UPV Informatika Fakultatea

M. Lardizabal 1 20008 Donostia

AAtutxa Ixa Taldea EHU IXA Group University of the Basque Country (EHU/UPV UPV Informatika Fakultatea

M. Lardizabal 1 20008 Donostia

KGojenola Ixa Taldea EHU IXA Group University of the Basque Country (EHU/UPV UPV Informatika Fakultatea

M. Lardizabal 1 20008 Donostia

ACasillas Ixa Taldea EHU IXA Group University of the Basque Country (EHU/UPV UPV Informatika Fakultatea

M. Lardizabal 1 20008 Donostia

ADíaz De Ilarraza Ixa Taldea EHU IXA Group University of the Basque Country (EHU/UPV UPV Informatika Fakultatea

M. Lardizabal 1 20008 Donostia

NEzeiza Ixa Taldea EHU IXA Group University of the Basque Country (EHU/UPV UPV Informatika Fakultatea

M. Lardizabal 1 20008 Donostia

MOronoz Ixa Taldea EHU IXA Group University of the Basque Country (EHU/UPV UPV Informatika Fakultatea

M. Lardizabal 1 20008 Donostia

APérez Ixa Taldea EHU IXA Group University of the Basque Country (EHU/UPV UPV Informatika Fakultatea

M. Lardizabal 1 20008 Donostia

OPerez De Viñaspre Ixa Taldea EHU IXA Group University of the Basque Country (EHU/UPV UPV Informatika Fakultatea

M. Lardizabal 1 20008 Donostia

Automatic Misogyny Identification Using Neural Networks D0319EBE83C56DC615959B462AFA0478 GROBID - A machine learning software for extracting information from scholarly documents Shared task Misogyny Neural Networks

In this paper we present our approach to automatically identify misogyny in Twitter tweets. That task is one of the two sub-tasks organized by AMI-IberEval 2018 organization. In order to carry out the task, we present a neural network approach. Neural network models have been demonstrated to be capable of achieving remarkable performance in sentence and document modeling. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modeling tasks, which adopt totally different ways of understanding natural languages. In this work we focus on RNN approach using a Bidirectional Long Short Term Memory (Bi-LSTM) with Conditional Random Fields (CRF) and we evaluate the proposed architecture on misogyny identification task (text classification). The experimental results show that the system can achieve good performance on this task obtaining 78.9 accuracy on English tweets and 76.8 accuracy on Spanish tweets.

Introduction

In the last couple of years we have started to see deep learning making significant inroads into areas where computers have previously seen limited success. Rather than requiring a set of fixed rules that are defined by the programmer, deep learning uses neural networks that learn rich non-linear relationships directly from data. Deep learning has also seen some success in NLP, for example in text classification. Text classification is an essential component in many applications, such as web searching, information filtering, and sentiment analysis [3].

A key problem in text classification is feature representation, which is commonly based on the bag-of-words (BoW) model, where unigrams, bigrams, ngrams or some specific patterns are extracted as features. Moreover, several feature selection methods, such as pLSA [4] or LDA [5] are applied to select more discriminative features. Nevertheless, traditional feature representation methods often have problems when they try to capture the semantics of the words because they ignore contextual information. This is a problem in text classification because contextual information is the key in order to correctly classify a text. Although high-order n-grams and more complex features are designed to capture more contextual information and word orders, the data sparsity problem remains, which heavily affects the classification accuracy.

In a recurrent neural network approach, the models analyze a text word by word and store the semantics of all the previous text in a fixed-sized hidden layer [6]. They receive as input a sequence of vectors and return another sequence that represents some information about the sequence at every step in the input. Although RNNs can learn long dependencies, they often fail to do so and tend to be biased towards their most recent inputs in the sequence [8]. Likewise, Long Short-term Memory Networks (LSTMs) incorporate a memory-cell and have been shown effective capturing long-range dependencies.

Classic LSTMs create the representation of each word of the sentence using only the left context. Is interesting to use also the right context if we want to create a more complete representation of the words, though. This can be done with a second LSTM that reads the same sequence in reverse. This type of LSTMs are named bidirectional LSTMs (BI-LSTMs) [9] and they create the representation of the words concatenating the left representation and the right representation. These representations effectively include a representation of a word in context, which is useful for numerous tasks.

On the other hand, Conditional Random Fields (CRF) are a probabilistic framework for labeling and segmenting structured data, such as sequences and trees. The underlying idea is that of defining a conditional probability distribution over label sequences given a particular observation sequence, rather than a joint distribution over both label and observation sequences. The primary advantage of CRFs is the relaxation of the independence assumption. Independence assumption states that the variables do not depend on each other and they do not affect each other in any way and this is not always the case and, consequently, it can lead to serious inaccuracies. Likewise, CRFs have been shown really effective in different tasks such as POS tagging [7], text processing [10] or computer vision [11].

Taking that into account, in this work we have employed a BI-LSTM with Conditional Random Fields (CRF) [7] in order to prove its effectiveness in misogynous tweet identification. In this area, one of the last works is [1] where the authors address the problem of automatic detection and categorization of misogynous language in online social media, and they set the bases to organize AMI-IberEval 2018 shared task [2].

In the rest of this paper we will first present the experimental setup we have used to carry out our experiments in section 2, followed by the results obtained in the shared task test set in section 3 and the main conclusions of the work in section 4.

System Description

We have divided this section into three subsections. In subsection 2.1 we explain the preprocessors we have used to tokenize and normalize the tweets, in subsection 2.2 the data resources we have employed in addition to the data shared by the organization, while in subsection 2.3 we focus on the used system.

Preprocessors

We have made use of one of python's packages designed for preprocessing tweets [12]. The tool performs tokenization, word normalization, word segmentation (for splitting hash-tags) and spell correction, using word statistics from 2 big corpora (English Wikipedia, twitter -330mil English tweets). In addition, for Spanish we have used a set of simple rules proposed in [13] for spell correction.

External Data

Our AMI System needs word-embeddings in order to create a better word representation for each word we find in the corpus. Thus, we have used wordembeddings extracted from the Spanish Billion Word Corpus [14] and from Wikipedia 2014 and Gigaword 5 [15].

AMI System

In order to classify the tweets we have employed a neural network based architecture, more precisely a specific Bi-LSTM (an RNN subclass) with a CRF on top of it as proposed in [7]. This kind of neural network is widely used to pursue sequence to sequence tagging. One of the advantages of using Bi-LSTM in contrast to other machine learning techniques such as SVM, Perceptron or CRFs is that the size of the context is automatically learned by the LSTM and there is no need to perform any complicated text preprocessing to obtain features to feed the tool. As we mentioned previously, our system is a tagger and marks the beginning and the next words of the sequences (IOB) we want to label. In this case we want to predict whether a tweet contains misogynous content or not. Thus, we introduce the tweets and the word-embeddings at the beginning of the process as in [7]. When a word is missing in the word-embeddings, the system replaces the word with unknown (UNK ) label.

In all cases the system returns every word of each tweet tagged with Yes label when the tweet contains misogyny and with No label otherwise. If the opposite happened, we would consider a tweet as misogynous if at least has one Yes label. The examples below are the output of the system for two tweets written in English and represent the aforementioned:

Results and Discussion

In AMI-IberEval 2018 shared task the participants can try their misogynous content identification systems in two languages: English and Spanish. We have participated in both languages and we have included the results for English track in table 1 and the results for Spanish track in table 2.

English R Team Accu R Team Accu R Team Accu 1 14-exlab-r1 91. If we analyze the results for English track, we observe that our position within all participants of the shared task is tenth with 78.9 of accuracy. Although we are far from winning the shared task (-12.4), we are in the first third of the classification and the two previous systems in the classification are not far (+ 0.4) from us which demonstrates a good performance of our system identifying misogynous tweets in English. Spanish R Team Accu R Team Accu R Team Accu 1 14-exlab-r3 81. On the other hand, our system's accuracy identifying Spanish written tweets is 76.8. This time our position within the all participants is seventeenth just above the shared task's baseline. However, almost all the participants have obtained accuracies between 81.4 and 76.6 which indicates that the vast majority of the systems are close to each other. Likewise, identifying misogynous content in Spanish written tweets is more difficult mostly because the lack of top quality resources (corpus, word-embeddings, preprocessors ...) we can find relatively easy for English.

Once we analyzed our system's results for both languages, taking into account our system was designed for sequential tagging or sequence labeling we consider the experimental setup has performed well in a task it was not thought for. We realize the best option to do text classification would have been a convolutional neural network (CNN) specially because the best systems of the state of the art employ this type of neural networks. Nevertheless, our main purpose has been to test a BI-LSTM with CRF on text classification task and bearing in mind its constraints the system has achieved reasonable results.

Conclusions

This paper presents our approach to automatically identify misogynous content in Twitter tweets. In order to carry out the task, we have chosen a neural network approach due to their ability to achieving remarkable performance in sentence and document modeling. In this work we focus on RNN approach using a Bidirectional Long Short Term Memory (Bi-LSTM) with Conditional Random Fields (CRF) and experimental results show that the system can achieve good performance identifying misogynous tweets obtaining 78.9 accuracy on English tweets and 76.8 accuracy on tweets.

[B-Yes]< user > [I-Yes]bitch [I-Yes]is [I-Yes]a [I-Yes]psyco [I-Yes], [I-Yes]* [I-Yes]dry [I-Yes]pussy [I-Yes]detected [I-Yes]* [B-No]you [I-No]give [I-No]me [I-No]life [I-No]! [I-No]< repeated > [I-No]< url >

Table 1 .1Results obtained by participants for English track using only provided training data (constrained).3 11 resham-r178.5 21 JoseSebastian-r1 74.92 14-exlab-r290.2 12 AMI-Baseline 78.3 22 Amrita CEN-r3 73.83 14-exlab-r489.8 13 vic -r278.0 23 vic -r170.94 14-exlab-r387.8 14 vic -r378.0 24 ITT-r170.65 SB-r487.0 15 vic -r478.0 25 vic -r564.66 SB-r585.1 16 maybelraul-r3 77.9 26 Amrita CEN-r2 56.37 14-exlab-r582.3 17 maybelraul-r1 77.1 27 GrCML2016-r3 52.78 AnotherOne-r1 79.3 18 maybelraul-r4 76.9 28 GrCML2016-r2 52.49 maybelraul-r2 79.3 19 maybelraul-r5 76.0 29 Amrita CEN-r1 51.910 ixaTeam-r178.9 20 ITT-r275.8

Table 2 .2Results obtained by participants for Spanish track using only provided training data (constrained).Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages(IberEval 2018) 4 10 SB-r380.5 19 maybelraul-r176.72 JoseSebastian-r1 81.4 11 SB-r180.3 20 vic -r276.63 SB-r481.3 12 AnotherOne-r180.2 21 Amrita CEN-r3 74.44 14-exlab-r181.2 13 maybelraul-r579.6 22 vic -r365.95 14-exlab-r281.2 14 maybelraul-r278.8 23 Amrita CEN-r1 54.26 14-exlab-r480.9 15 maybelraul-r378.7 24 14-exlab-r553.67 SB-r280.8 16 maybelraul-r478.2 25 Amrita CEN-r2 52.98 SB-r580.6 17 ixaTeam-r176.89 vic -r180.5 18 AMI-BASELINE 76.7

Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages(IberEval 2018)

Acknowledgments

This work has been partially funded by: -The Spanish ministry (projects TADEEP: TIN2015-70214-P, PROSA-MED: TIN2016-77820-C3-1-R). -The Basque Government (projects DETEAMI: 2014111003, ELKAROLA:KK-2015/00098).

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

Automatic Identification and Classification of Misogynistic Language on Twitter MAnzovino EFersini PRosso Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages Lecture Notes in Computer Science MSilberztein FAtigui EKornyshova EMétais FMeziane the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages

IberEval

2018. 2018 10859 Natural Language Processing and Information Systems. NLDB 2018 Overview of the Task on Automatic Misogyny Identification at IberEval EFersini MAnzovino PRosso Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018), co-located with 34th Conference of the Spanish Society for Natural Language Processing (SE-PLN 2018 the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018), co-located with 34th Conference of the Spanish Society for Natural Language Processing (SE-PLN 2018

Seville, Spain

September 18, 2018 CEUR Workshop Proceedings. CEUR-WS.org SemEval-2016 Task 7: Determining sentiment intensity of English and Arabic phrases SKiritchenko SMohammad MSalameh Proceedings of the 10th International Workshop on Semantic Evaluation the 10th International Workshop on Semantic Evaluation

San Diego, California, USA

2016 16 Text categorization by boosting automatically extracted concepts LCai THofmann SIGIR 2003 Document classification by topic labeling SHingmire SChougule GKPalshikar SChakraborti SIGIR 2013 Finding structure in time JLElman Cognitive science 14 2 1990 Neural architectures for named entity recognition GLample MBallesteros SSubramanian KKawakami CDyer arXiv:1603.01360 2016 arXiv preprint Learning long-term dependencies with gradient descent is difficult YBengio PSimard PFrasconi IEEE Transactions on 5 2 1994 Neural Networks Framewise phoneme classification with bidirectional LSTM networks AGraves JSchmidhuber Proc. IJCNN IJCNN 2005 Abner: an open source tool for automatically tagging genes, proteins, and other entity names in text BSettles Bioinformatics 21 14 2005 Multiscale conditional random fields for image labelling XHe RSZemel MACarreira-Perpinian IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2004 DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis CBaziotis NPelekis CDoulkeridis Proceedings of the 11th International Workshop on Semantic Evaluation the 11th International Workshop on Semantic Evaluation

SemEval-

2017. 2017 IAlegria IEtxeberria GLabaka Una Cascada de Transductores Simples para Normalizar Tweets 2013 Tweet-Norm@ SEPLN CCardellino Spanish Billion Words Corpus and Embeddings 2016 GloVe: Global Vectors for Word Representation JPennington RSocher CDManning 2014