=Paper= {{Paper |id=Vol-2150/AMI_paper4 |storemode=property |title=Automatic Misogyny Identification Using Neural Networks |pdfUrl=https://ceur-ws.org/Vol-2150/AMI_paper4.pdf |volume=Vol-2150 |authors=Iakes Goenaga,Aitziber Atutxa,Koldo Gojenola,Arantza Casillas,Arantza Díaz de Ilarraza,Nerea Ezeiza,Maite Oronoz,Alicia Pérez,Olatz Perez de Viñaspre |dblpUrl=https://dblp.org/rec/conf/sepln/GoenagaAGCIEOPP18a }} ==Automatic Misogyny Identification Using Neural Networks== https://ceur-ws.org/Vol-2150/AMI_paper4.pdf
Automatic Misogyny Identification Using Neural
                 Networks

    I. Goenaga, A. Atutxa, K. Gojenola, A. Casillas, A. Dı́az de Ilarraza, N.
            Ezeiza, M. Oronoz, A. Pérez and O. Perez de Viñaspre

             IXA Group - University of the Basque Country (EHU/UPV)
    Ixa Taldea EHU/UPV Informatika Fakultatea M. Lardizabal 1 20008 Donostia
                               http://ixa.si.ehu.es/
                                iakesg@gmail.com


       Abstract. In this paper we present our approach to automatically iden-
       tify misogyny in Twitter tweets. That task is one of the two sub-tasks
       organized by AMI-IberEval 2018 organization. In order to carry out the
       task, we present a neural network approach. Neural network models have
       been demonstrated to be capable of achieving remarkable performance in
       sentence and document modeling. Convolutional neural network (CNN)
       and recurrent neural network (RNN) are two mainstream architectures
       for such modeling tasks, which adopt totally different ways of under-
       standing natural languages. In this work we focus on RNN approach
       using a Bidirectional Long Short Term Memory (Bi-LSTM) with Condi-
       tional Random Fields (CRF) and we evaluate the proposed architecture
       on misogyny identification task (text classification). The experimental
       results show that the system can achieve good performance on this task
       obtaining 78.9 accuracy on English tweets and 76.8 accuracy on Spanish
       tweets.

       Keywords: Shared task · Misogyny · Neural Networks.


1    Introduction
In the last couple of years we have started to see deep learning making significant
inroads into areas where computers have previously seen limited success. Rather
than requiring a set of fixed rules that are defined by the programmer, deep
learning uses neural networks that learn rich non-linear relationships directly
from data. Deep learning has also seen some success in NLP, for example in text
classification. Text classification is an essential component in many applications,
such as web searching, information filtering, and sentiment analysis [3].
    A key problem in text classification is feature representation, which is com-
monly based on the bag-of-words (BoW) model, where unigrams, bigrams, n-
grams or some specific patterns are extracted as features. Moreover, several fea-
ture selection methods, such as pLSA [4] or LDA [5] are applied to select more
discriminative features. Nevertheless, traditional feature representation methods
often have problems when they try to capture the semantics of the words be-
cause they ignore contextual information. This is a problem in text classification
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)




        2                                       Authors Suppressed Due to Excessive Length

        because contextual information is the key in order to correctly classify a text.
        Although high-order n-grams and more complex features are designed to cap-
        ture more contextual information and word orders, the data sparsity problem
        remains, which heavily affects the classification accuracy.

            In a recurrent neural network approach, the models analyze a text word
        by word and store the semantics of all the previous text in a fixed-sized hidden
        layer [6]. They receive as input a sequence of vectors and return another sequence
        that represents some information about the sequence at every step in the input.
        Although RNNs can learn long dependencies, they often fail to do so and tend
        to be biased towards their most recent inputs in the sequence [8]. Likewise, Long
        Short-term Memory Networks (LSTMs) incorporate a memory-cell and have
        been shown effective capturing long-range dependencies.

           Classic LSTMs create the representation of each word of the sentence using
        only the left context. Is interesting to use also the right context if we want to
        create a more complete representation of the words, though. This can be done
        with a second LSTM that reads the same sequence in reverse. This type of
        LSTMs are named bidirectional LSTMs (BI-LSTMs) [9] and they create the
        representation of the words concatenating the left representation and the right
        representation. These representations effectively include a representation of a
        word in context, which is useful for numerous tasks.

            On the other hand, Conditional Random Fields (CRF) are a probabilistic
        framework for labeling and segmenting structured data, such as sequences and
        trees. The underlying idea is that of defining a conditional probability distribu-
        tion over label sequences given a particular observation sequence, rather than a
        joint distribution over both label and observation sequences. The primary advan-
        tage of CRFs is the relaxation of the independence assumption. Independence
        assumption states that the variables do not depend on each other and they do
        not affect each other in any way and this is not always the case and, conse-
        quently, it can lead to serious inaccuracies. Likewise, CRFs have been shown
        really effective in different tasks such as POS tagging [7], text processing [10] or
        computer vision [11].

           Taking that into account, in this work we have employed a BI-LSTM with
        Conditional Random Fields (CRF) [7] in order to prove its effectiveness in misog-
        ynous tweet identification. In this area, one of the last works is [1] where the
        authors address the problem of automatic detection and categorization of misog-
        ynous language in online social media, and they set the bases to organize AMI-
        IberEval 2018 shared task [2].

            In the rest of this paper we will first present the experimental setup we have
        used to carry out our experiments in section 2, followed by the results obtained
        in the shared task test set in section 3 and the main conclusions of the work in
        section 4.




                                                        250
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)




                              Automatic Misogyny Identification Using Neural Networks                      3

        2     System Description
        We have divided this section into three subsections. In subsection 2.1 we explain
        the preprocessors we have used to tokenize and normalize the tweets, in subsec-
        tion 2.2 the data resources we have employed in addition to the data shared by
        the organization, while in subsection 2.3 we focus on the used system.

        2.1    Preprocessors
        We have made use of one of python’s packages designed for preprocessing tweets
        [12]. The tool performs tokenization, word normalization, word segmentation
        (for splitting hash-tags) and spell correction, using word statistics from 2 big
        corpora (English Wikipedia, twitter - 330mil English tweets). In addition, for
        Spanish we have used a set of simple rules proposed in [13] for spell correction.

        2.2    External Data
        Our AMI System needs word-embeddings in order to create a better word rep-
        resentation for each word we find in the corpus. Thus, we have used word-
        embeddings extracted from the Spanish Billion Word Corpus [14] and from
        Wikipedia 2014 and Gigaword 5 [15].

        2.3    AMI System
        In order to classify the tweets we have employed a neural network based archi-
        tecture, more precisely a specific Bi-LSTM (an RNN subclass) with a CRF on
        top of it as proposed in [7]. This kind of neural network is widely used to pursue
        sequence to sequence tagging. One of the advantages of using Bi-LSTM in con-
        trast to other machine learning techniques such as SVM, Perceptron or CRFs
        is that the size of the context is automatically learned by the LSTM and there
        is no need to perform any complicated text preprocessing to obtain features to
        feed the tool. As we mentioned previously, our system is a tagger and marks the
        beginning and the next words of the sequences (IOB) we want to label. In this
        case we want to predict whether a tweet contains misogynous content or not.
        Thus, we introduce the tweets and the word-embeddings at the beginning of the
        process as in [7]. When a word is missing in the word-embeddings, the system
        replaces the word with unknown (UNK ) label.
            In all cases the system returns every word of each tweet tagged with Yes
        label when the tweet contains misogyny and with No label otherwise. If the
        opposite happened, we would consider a tweet as misogynous if at least has
        one Yes label. The examples below are the output of the system for two tweets
        written in English and represent the aforementioned:
              [B-Yes]< user > [I-Yes]bitch [I-Yes]is [I-Yes]a [I-Yes]psyco [I-Yes],
                   [I-Yes]* [I-Yes]dry [I-Yes]pussy [I-Yes]detected [I-Yes]*
              [B-No]you [I-No]give [I-No]me [I-No]life [I-No]! [I-No]< repeated >
                                        [I-No]< url >




                                                        251
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)




        4                                       Authors Suppressed Due to Excessive Length

        3     Results and Discussion

        In AMI-IberEval 2018 shared task the participants can try their misogynous
        content identification systems in two languages: English and Spanish. We have
        participated in both languages and we have included the results for English track
        in table 1 and the results for Spanish track in table 2.


                                         English
            R      Team     Accu R      Team     Accu R        Team       Accu
            1 14-exlab-r1   91.3 11 resham-r1     78.5 21 JoseSebastian-r1 74.9
            2 14-exlab-r2   90.2 12 AMI-Baseline 78.3 22 Amrita CEN-r3 73.8
            3 14-exlab-r4   89.8 13 vic -r2       78.0 23 vic -r1          70.9
            4 14-exlab-r3   87.8 14 vic -r3       78.0 24 ITT-r1           70.6
            5 SB-r4         87.0 15 vic -r4       78.0 25 vic -r5          64.6
            6 SB-r5         85.1 16 maybelraul-r3 77.9 26 Amrita CEN-r2 56.3
            7 14-exlab-r5   82.3 17 maybelraul-r1 77.1 27 GrCML2016-r3 52.7
            8 AnotherOne-r1 79.3 18 maybelraul-r4 76.9 28 GrCML2016-r2 52.4
            9 maybelraul-r2 79.3 19 maybelraul-r5 76.0 29 Amrita CEN-r1 51.9
            10 ixaTeam-r1   78.9 20 ITT-r2        75.8
        Table 1. Results obtained by participants for English track using only provided train-
        ing data (constrained).




            If we analyze the results for English track, we observe that our position within
        all participants of the shared task is tenth with 78.9 of accuracy. Although we
        are far from winning the shared task (- 12.4), we are in the first third of the
        classification and the two previous systems in the classification are not far (+
        0.4) from us which demonstrates a good performance of our system identifying
        misogynous tweets in English.


                                            Spanish
            R      Team       Accu R        Team     Accu R       Team     Accu
            1 14-exlab-r3      81.4 10 SB-r3         80.5 19 maybelraul-r1 76.7
            2 JoseSebastian-r1 81.4 11 SB-r1         80.3 20 vic -r2       76.6
            3 SB-r4            81.3 12 AnotherOne-r1 80.2 21 Amrita CEN-r3 74.4
            4 14-exlab-r1      81.2 13 maybelraul-r5 79.6 22 vic -r3       65.9
            5 14-exlab-r2      81.2 14 maybelraul-r2 78.8 23 Amrita CEN-r1 54.2
            6 14-exlab-r4      80.9 15 maybelraul-r3 78.7 24 14-exlab-r5   53.6
            7 SB-r2            80.8 16 maybelraul-r4 78.2 25 Amrita CEN-r2 52.9
            8 SB-r5            80.6 17 ixaTeam-r1    76.8
            9 vic -r1          80.5 18 AMI-BASELINE 76.7
        Table 2. Results obtained by participants for Spanish track using only provided train-
        ing data (constrained).




                                                        252
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)




                              Automatic Misogyny Identification Using Neural Networks                      5

            On the other hand, our system’s accuracy identifying Spanish written tweets
        is 76.8. This time our position within the all participants is seventeenth just
        above the shared task’s baseline. However, almost all the participants have ob-
        tained accuracies between 81.4 and 76.6 which indicates that the vast majority
        of the systems are close to each other. Likewise, identifying misogynous content
        in Spanish written tweets is more difficult mostly because the lack of top qual-
        ity resources (corpus, word-embeddings, preprocessors ...) we can find relatively
        easy for English.
            Once we analyzed our system’s results for both languages, taking into account
        our system was designed for sequential tagging or sequence labeling we consider
        the experimental setup has performed well in a task it was not thought for. We
        realize the best option to do text classification would have been a convolutional
        neural network (CNN) specially because the best systems of the state of the art
        employ this type of neural networks. Nevertheless, our main purpose has been
        to test a BI-LSTM with CRF on text classification task and bearing in mind its
        constraints the system has achieved reasonable results.


        4     Conclusions
        This paper presents our approach to automatically identify misogynous content
        in Twitter tweets. In order to carry out the task, we have chosen a neural network
        approach due to their ability to achieving remarkable performance in sentence
        and document modeling. In this work we focus on RNN approach using a Bidirec-
        tional Long Short Term Memory (Bi-LSTM) with Conditional Random Fields
        (CRF) and the experimental results show that the system can achieve good
        performance identifying misogynous tweets obtaining 78.9 accuracy on English
        tweets and 76.8 accuracy on Spanish tweets.


        Acknowledgments
        This work has been partially funded by:
          – The Spanish ministry (projects TADEEP: TIN2015-70214-P, PROSA-MED:
            TIN2016-77820-C3-1-R).
          – The Basque Government (projects DETEAMI: 2014111003, ELKAROLA:KK-
            2015/00098).
        We gratefully acknowledge the support of NVIDIA Corporation with the dona-
        tion of the Titan X Pascal GPU used for this research.


        References
        1. M. Anzovino, E. Fersini, P. Rosso. 2018. Automatic Identification and Classification
           of Misogynistic Language on Twitter. In: M. Silberztein, F. Atigui, E. Kornyshova,
           E. Métais, F. Meziane. (eds) Natural Language Processing and Information Systems.
           NLDB 2018. Lecture Notes in Computer Science, vol 10859.




                                                        253
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)




        6                                       Authors Suppressed Due to Excessive Length

        2. E. Fersini, M. Anzovino, P. Rosso. Overview of the Task on Automatic Misogyny
           Identification at IberEval. In: Proceedings of the Third Workshop on Evaluation of
           Human Language Technologies for Iberian Languages (IberEval 2018), co-located
           with 34th Conference of the Spanish Society for Natural Language Processing (SE-
           PLN 2018). CEUR Workshop Proceedings. CEUR-WS.org, Seville, Spain, Septem-
           ber 18, 2018.
        3. S. Kiritchenko, S. Mohammad, and M. Salameh. 2016. SemEval-2016 Task 7: De-
           termining sentiment intensity of English and Arabic phrases. In Proceedings of the
           10th International Workshop on Semantic Evaluation. San Diego, California, USA,
           SemEval ’16, pages 42–51.
        4. L. Cai, and T. Hofmann. 2003. Text categorization by boosting automatically ex-
           tracted concepts. In SIGIR, 182–189.
        5. S. Hingmire, S. Chougule, G. K. Palshikar and S. Chakraborti. 2013. Document
           classification by topic labeling. In SIGIR, 877–880.
        6. J. L. Elman. Finding structure in time. 1990. Cognitive science 14(2):179–211.
        7. G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami and C. Dyer. 2016. Neural
           architectures for named entity recognition. arXiv preprint arXiv:1603.01360.
        8. Y. Bengio, P. Simard, and P. Frasconi. 1994. Learning long-term dependencies with
           gradient descent is difficult. Neural Networks, IEEE Transactions on, 5(2):157–166.
        9. A. Graves and J. Schmidhuber. 2005. Framewise phoneme classification with bidi-
           rectional LSTM networks.In Proc. IJCNN.
        10. B. Settles. 2005. Abner: an open source tool for automatically tagging genes, pro-
           teins, and other entity names in text. Bioinformatics, 21(14):3191–3192.
        11. X. He, R. S. Zemel and M. A. Carreira-Perpinian. 2004. Multiscale conditional ran-
           dom fields for image labelling. In IEEE Computer Society Conference on Computer
           Vision and Pattern Recognition.
        12. C. Baziotis, N. Pelekis and C. Doulkeridis. 2017. DataStories at SemEval-2017
           Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment
           Analysis. Proceedings of the 11th International Workshop on Semantic Evaluation
           (SemEval-2017), pages 747-754.
        13. I. Alegria, I. Etxeberria and G. Labaka. 2013. Una Cascada de Transductores
           Simples para Normalizar Tweets. Tweet-Norm@ SEPLN.
        14. C. Cardellino. 2016. Spanish Billion Words Corpus and Embeddings.
           http://crscardellino.me/SBWCE/
        15. J. Pennington, R. Socher and C. D. Manning. 2014. GloVe: Global Vectors for
           Word Representation.




                                                        254