<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Multi-Task and Multilingual Model for Sexism Identi cation in Social Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francisco Rodr guez-Sanchez</string-name>
          <email>frodriguez.sanchez@invi.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorge Carrillo-de-Albornoz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Plaza</string-name>
          <email>lplazag@lsi.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>UNED NLP &amp; IR Group, Calle Juan del Rosal</institution>
          ,
          <addr-line>16. 28040 Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Exposure to sexist content has serious consequences for women's life and limits their freedom of speech. In this paper, we present a multilingual system based on pre-trained transformers and compare singletask to multi-task learning to identify sexism in social networks. Our methods have been evaluated in the framework of our participation in the EXIST shared task at IberLEF 2021 [1] obtaining promising results despite sharing parameters for both languages and tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>Sexism detection</kwd>
        <kwd>NLP</kwd>
        <kwd>Transformers</kwd>
        <kwd>Multi-task learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The development of web technologies has enabled the interaction between people
from many di erent countries and backgrounds. With more than 4 billion people
around the world now using social media each month [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], social networks are
undoubtedly one of the most important ways of communicating. Although we can
not deny the positive e ects of this global communication, anonymity and
accessibility have made the expression of discriminatory and sexist discourses easy
and unpunished. In this context, inequality and discrimination against women
that remain embedded in society are increasingly being replicated and spread
online.
      </p>
      <p>
        The Oxford English Dictionary de nes sexism as \prejudice, stereotyping or
discrimination, typically against women, on the basis of sex". Therefore, sexism
is expressed in very di erent forms that do not always express hostility or hate.
Subtle forms of sexism can be as pernicious as other forms of sexism and a ect
women in many facets of their lives. According to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], non-hateful sexism can
a ect women's psychological well-being by decreasing their comfort, increasing
their feelings of anger and depression, and decreasing their stated self-esteem.
Similarly, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] found a relationship between the experience of non-violent sexism
and posttraumatic stress disorder.
      </p>
      <p>
        Detecting sexist content is still a di cult task for social media platforms. For
instance, Amnesty International published a report [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] where they describe
Twitter as a \toxic place" for women. According to this report, Twitter is promoting
violence and hate against people based on their gender. The report also suggests
that Twitter is failing to protect women against harassment and it could harm
their freedom of speech. Recently, members of the U.S. Congress asked Facebook
to do more to protect women in their platform. According to some lawmakers,
social media has become \the number one place" in which psychological violence
is perpetrated against female parliamentarians [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The seriousness of the
problem, combined with the quick spread of online information, especially in social
networks, has made these harassment behaviours extremely dangerous so that
solutions are required to perform a faster and even better user generated-content
moderation, or to serve as a tool that helps human moderators to reduce the
volume of sexist content still present in online platforms.
      </p>
      <p>In this paper, we describe our participation in the EXIST task at IberLEF
2021, a sexist language detection task in two di erent languages. The challenge
was articulated in two di erent tasks: task 1 is a binary classi cation to
determine whether a text is sexist or not, while task 2 is a ner-grained classi cation
devoted to distinguishing di erent subtypes of sexism. We propose a multilingual
system based on pre-trained transformers and experiment with single-task and
multi-task approaches to jointly address the task of sexist language detection.
We take advantage of the fact that both tasks are semantically connected to test
whether both tasks can be simultaneously learned and one task can bene t the
other using a multi-task framework. To the best of our knowledge, no previous
work has employed this technique to identify sexism in social networks. Our
single-model approach achieved competitive results, with a performance close
to top-performing systems despite sharing parameters for both languages and
tasks.</p>
      <p>The rest of this paper is organized as follows: in section 2, we discuss related
works. In section 3, we describe the classi cation system. Results and analysis
are presented in section 4. Finally, the conclusions and future works are given in
section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        The detection of hate speech and misogyny are tasks that are closely connected
and often confused with sexism detection [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Substantial work has been devoted
to the detection of hate speech in recent years but few works have faced sexism
detection. Most of them have dealt with sexism as the detection of hate speech
against women or misogyny [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Consequently, they have worked with hostile and
explicit sexism, overlooking subtle or implicit expressions of sexism. An exception
is the approach proposed by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], where authors released the rst Spanish corpus of
sexist expressions in Twitter, the MeTwo dataset. They also compared Machine
Learning (ML) methods to detect sexism and discussed the generalization of
their approach with respect to misogyny detection systems.
      </p>
      <p>
        Recently, the IberEval competition focused on the automatic identi cation
of misogyny in Twitter [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Teams were proposed to identify misogynist tweets
both in Spanish and English. Approaches presented to the competition were
mainly based on supervised machine learning on di erent textual features (such
as unigrams and bigrams, sentiment-based information, or syntactic categories)
or user-based features (such as the number of retweets, followers, etc.) [9{11].
The use of lexical resources for extracting signals (such as swear word count,
or sexist slurs presence) showed excellent performance in the task [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Deep
learning methods were explored only by one team along with word embedding
features [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        The appearance of multilingual transformers has shifted the trend in natural
language processing, with many positive experimental results for hate speech
detection. For instance, [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] explored the feasibility of detecting misogyny in
three di erent languages using the multilingual Bidirectional Encoder
Representation from Transformer (multilingual BERT or mBERT) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Another example
is found in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], where authors presented an ensemble model of individual
transformers as the winner solution in the shared task \O ensive Language Identi
cation in Dravidian Languages" at EACL 2021.
      </p>
      <p>Multi-task learning (MTL) has proven successful in many Natural Language
Processing (NLP) problems, as illustrated in the overview of [17]. In this paradigm,
multiple tasks are simultaneously learned by a shared model o ering advantages
like improved data e ciency, reduced over tting through shared representations,
and fast learning by leveraging auxiliary information. Only a few studies are
using MTL to detect hate speech language. [18] employed emotion detection as the
auxiliary task to address the detection of abusive language. Another example can
be found in [19], where a MTL approach was applied to detect hostile content.</p>
      <p>Although multilingual and multi-task models have been tested as end-to-end
solutions for several tasks related to hate speech, to the best of our knowledge,
no previous work has explicitly used these techniques for the sexism detection
task.
3</p>
    </sec>
    <sec id="sec-3">
      <title>EXIST 2021: sEXism Identi cation in Social neTworks</title>
      <p>The shared task EXIST 2021 at IberLEF 2021 [20] asked participants to classify
\tweets" and \gabs" in two di erent languages, English and Spanish. The
objective of the shared task is to develop methodologies and classi cation systems
to detect sexist messages according to the following two tasks:
{ Task 1: It is a binary classi cation task, where every system should determine
whether a text or message is sexist or non-sexist.
{ Task 2: Once a message has been classi ed as sexist, the second task aims
to categorize the message according to 5 types of sexism: Ideological and
inequality, Misogyny and non-sexual-violence, Objecti cation, Sexual violence,
Stereotyping and dominance.</p>
      <p>Task 1 is evaluated in terms of accuracy, while for Task 2 the evaluation
consists in the macro-average of the F1-scores on the 6 classes: Non-sexist,
Ideological and inequality, Misogyny and non-sexual-violence, Objecti cation, Sexual
violence, Stereotyping and dominance. Each participating team could submit a
maximum of 6 runs, 3 runs for each task.</p>
      <p>Two di erent datasets were shared during the challenge. In total, the
organizers provided 6977 tweets for training and 4368 texts for testing composed
by 3.386 tweets and 982 gabs. The organizers ensured class balancing according
to task 1, while the distribution of data for task 2 was relatively unbalanced,
re ecting a more natural distribution of sexist content.
4</p>
    </sec>
    <sec id="sec-4">
      <title>System description</title>
      <p>
        In recent years, transformer-based language models like BERT [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and its
variant RoBERTa [21], have become the state of the art for most NLP tasks. In
particular, multilingual versions of these systems have shown surprising
crosslingual capabilities, even among languages that do not share scripts [22].
      </p>
      <p>For our work, we ne-tuned three di erent state-of-the-art multilingual
transformer models: mBERT, XLM-RoBERTa [23], and XLM-Twitter [24]. mBERT
shares the same training as single-language BERT but using a concatenated
dataset of 104 languages, XLM-RoBERTa (XLM-R) was trained on data from
100 and XLM-Twitter (XML-T) makes start from XLM-R and continue
pretraining on a large corpus of Twitter in 30 languages.</p>
      <p>While, in most cases, the multilingual models are trained and tested
independently for each language and do not combine di erent languages in a single
evaluation, our approach allows us to tackle the task for both languages at the
same time and share the same model.
4.1</p>
      <sec id="sec-4-1">
        <title>Single-task model</title>
        <p>As the tasks are evaluated independently, we have explored transformer models
for each task independently and will be referring to them as single-task models.
Figure 1 shows the model architecture for this approach. On top of the
transformer model, we added a linear layer to minimize loss function in our particular
task. In particular, we used cross-entropy loss for both tasks, with two and six
labels respectively: a binary problem for task1 and a multi-class classi cation
with 5 types of sexism and non-sexist for task 2.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Multi-task learning with learnable parameter</title>
        <p>To exploit the fact that both tasks share the same data distribution and are
semantically connected, we propose to learn a model jointly on both of them.
Figure 1 shows the model architecture for this approach. Speci cally, we consider
hard parameter sharing [17] among both tasks using a base model, followed by
two linear layers for each classi cation task. As base model, we employed all
transformer models previously described in this section (section 4).</p>
        <p>We experimented with the inclusion of a learnable parameter to control
the importance we place on each task in the multi-task learning framework. In
particular, we compute loss with the following expression:</p>
        <p>L =</p>
        <p>LT ASK1 + (1
)LT ASK2</p>
        <p>Where LT ASK1 and LT ASK2 are cross-entropy losses for each task. Since in
our problem both tasks are equally important, we set an initial value of = 0.5.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Data augmentation with MeTwo dataset</title>
        <p>Data Augmentation is a quite popular solution to improve systems generalization
by generating slight variants of the given dataset and is extremely useful for small
datasets [25].</p>
        <p>
          For our approach, we did some experiments concatenating the MeTwo dataset
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] to the EXIST dataset. In particular, we removed all tweets from the
\DOUBTFUL" class in MeTwo and used the \SEXIST" and \NON-SEXIST" labels to
perform this experiment for task 1. For multi-task experiments, tweets from
MeTwo did not contribute to task 2 loss. Finally, since MeTwo is considerably
unbalanced to the \NON-SEXIST" label, we balanced both classes.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results and analysis</title>
      <sec id="sec-5-1">
        <title>Experimental setup and preprocessing</title>
        <p>All the experiments were performed using Pytorch [26] and HuggingFace [27]
Transformers library. As the implementation environment, we used a NVIDIA
Tesla T4 GPU. Optimization was done using Adam [28] optimizer with an initial
learning rate of 2 5 and a linear weight decay of 0.01 for training single-task and
multi-task models. We trained all models with a batch size of 16 for 20 epochs
with an early stopping of 8 epochs. We make our code publicly available at
Github [29].</p>
        <p>The only preprocessing step before feeding the input to the transformer
tokenizers was converting to lowercase, replacing mentions, hashtags, and URLs
with a keyword, and removing punctuation signs.</p>
        <p>To evaluate our systems, we trained all models on 70% of the training data,
and held out the remaining 30% for validation.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Results</title>
        <p>Here, we report the performance of the approaches described in the previous
section. Table 1 summarizes the results obtained for di erent experiments in the
validation set, they are reported in terms of accuracy and macro-f1. We observe
that, among the individual transformer models, the best performance is obtained
using XLM-T. It can be due to the fact that it is pre-trained using data from
Twitter, the same datasource of our task.</p>
        <p>In the case of multi-tasking approaches, classi ers perform well, having small
di erences with respect to single-task systems for task 1 and outperforming
them for task 2. Regarding data augmentation using the MeTwo corpus, we can
observe that it generally improves results for task 2, which could suggest that
adding instances from task 1 improves results in task 2. It also should be noted
that the inclusion of a learnable parameter to control the importance we place
on each task slightly improves results for task2.</p>
        <p>We presented our three classi ers to the challenge using di erent approaches
so that we could compare their performance in the test set. In particular, we sent
a single-task classi er and two multi-task systems, using data augmentation and
a parameter to control the importance of the tasks.</p>
        <p>Table 2 illustrates the results obtained in the competition, where the results
are reported in terms of accuracy for task 1 and macro-f1 for task 2.
Regarding task 1, our single-task multilingual classi er performs quite well, achieving
performances comparable to the winning teams. Similarly, our multi-task model
performs fairly well, having a di erence of around 2% with respect to the best
result.</p>
        <p>For task 2, most participants achieved relatively low results, showing the
di culty of this task. The multi-task approach yielded our best results and
it stays in the top cluster of the competition (11 out of 63 runs). Unlike our
experiments, using data augmentation did not perform well in the test set for
task 2. This could be due to the inclusion of Gab in the test set, which is biased
towards aggressive sexism.</p>
        <p>Once the evaluation phase was over, organizers shared the labels for the test
set in case participants wanted to perform further tests. We added two extra
experiments to table 2 using two models we did not present to the competition.
As we can see, we would have obtained slightly better results for task 2. As we
observed in our experiments, multi-task approaches yield better results for task
2 than single-task models.</p>
        <p>Task 1 Task 2
Accuracy Run Rank Macro-F1 Run Rank
Rank-1 0,784 1 0,5787 1
Majority Class (baseline) 0,6845 66 0,4778 62
SVM TFIDF (baseline) 0,522 52 0,522 51
XLM-T-single-task (run 1) 0,772 7 0,544 15
XLM-T-multi-task-and-metwo-balanced (run 2) 0,7324 29 0,5246 22
XLM-R-multi-task-learnable-parameter (run 3) 0,7571 17 0,5509 11
XLM-T-multi-task 0,764 - 0,554
XLM-T-multi-task-learnable-parameter-concat-metwo 0,747 - 0,553
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Error analysis</title>
        <p>Although we achieve interesting results, all models are still making some
mistakes. To understand better the source of the failures, we have performed a deep
analysis of model errors. In particular, we further investigate the results of the
single-task XLM-T model for each task.</p>
        <p>Figure 2 displays the confusion matrix for tasks 1 and 2. Regarding task 1,
the non-sexist class performs worse than the sexist one. For task 2, most errors
come from the misogyny-non-sexual-violence and stereotyping-dominance. We
attribute this to the heterogeneity of these classes thus many types of sexist
attitudes could be part of them. For instance, the sentence \Some woman are so
toxic they don't even know they are draining everyone around them in poison.
If you lack self awareness you won't even notice how toxic you really are" and
\They refuse to arrest the separatist who has broken the nose of a woman for
removing ties" are both instances of the misogyny-non-sexual-violence class, but
for di erent reasons. On the contrary, ideological-inequality is a more
homogeneous group and the performance is better.</p>
        <p>To analyze the reasons behind the errors of our model, we used the library
transformers-interpret [30] to have more information about the importance of
each token towards the predicted class. Figure 3 shows the word importance
for some errors examples. In this gure, red means that the token is pushing
towards the \incorrect" (and predicted) class, whereas green pulls towards the
correct class. As we can see, it turns out that numerous spurious correlations
are learned by our classi er: words such as \puta" or \all" trigger the sentence
as sexist. Similarly, the existence of words such as \ill" or \vegan" pushes the
prediction of the 4th sentence towards non-sexist. The last two examples are
related to errors for task 2. Both are cases where the classi ers fail to detect the
type of sexism because of the appearance of irrelevant terms like \straight" and
\want".
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper, we have described a classi cation model for sexist language
detection in a multilingual scenario. We also compared single-task to multi-task
approaches and experimented with data augmentation techniques using a
corpus from the same domain. The results obtained in the framework of the EXIST
2021 competition are promising since our single-model approach had close
performance to top-performing systems despite sharing parameters for both languages
and tasks. Furthermore, the results show how our model tted spurious
correlations for certain terms that must be carefully analyzed with more experiments.</p>
      <p>As future work, we plan to experiment with the inclusion of a ective lexicons
to improve the automatic detection of sexism. It is also important to note that
the strategy to construct the dataset is keyword-based, which can introduce
natural biases towards certain sexist terms. Thus, bias mitigation techniques
could be useful to improve performance.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by the Spanish Ministry of Science and Innovation
under Project Misinformation and Miscommunication in Social Media
(PGC2018096212-B-C32).
Speech and Language Technologies for Dravidian Languages (EACL 2021), pp.
270{276. (2021)
17. Crawshaw, M.: Multi-Task Learning with Deep Neural Networks: A Survey, arXiv
(2020)
18. Rajamanickam, S., Mishra, P., Yannakoudakis, H., Shutova, E.: Joint Modelling
of Emotion and Abusive Language Detection. In: Proceedings of the 58th Annual
Meeting of the Association for Computational Linguistics (ACL 2020), pp. 4270|
4279. (2020)
19. Kamal O., Kumar, A., Vaidhya, T.: Hostility Detection in Hindi Leveraging
Pretrained Language Models. In: First International Workshop, Combating Online
Hostile Posts in Regional Languages during Emergency Situation 2021, Collocated with
AAAI 2021, (CONSTRAINT 2021), pp. 213{223. (2021)
20. Rodr guez-Sanchez, F., Carrillo-de-Albornoz, J., Plaza, L., Gonzalo, J., Rosso, P.,
Comet, M., Donoso, T.: Overview of EXIST 2021: sEXism Identi cation in Social
neTworks. Procesamiento del Lenguaje Natural 67 (2021).
21. Liu, Y. et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach,
arXiv (2019)
22. Pires, T., Schlinger, E., Garrette, D.: How Multilingual is Multilingual BERT?.</p>
      <p>In: Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics (ACL 2019), pp. 4996-{5001. (2019)
23. Conneau, A. et al.: Unsupervised Cross-lingual Representation Learning at Scale.</p>
      <p>In: Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics (ACL 2020), pp. 8440|8451. (2020)
24. Barbieri F., Anke, L., E., Camacho-Collados, J.: XLM-T: A Multilingual Language</p>
      <p>Model Toolkit for Twitter, arXiv (2021)
25. Feng, S., Y. et al.: A Survey of Data Augmentation Approaches for NLP, arXiv
(2021)
26. Paszke, A. et al.: PyTorch: An imperative style, high-performance deep learning
library. In: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), (2019)
27. Wolf, T. et al.: Transformers: State-of-the-Art Natural Language Processing. In:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language
Processing: System Demonstrations (EMNLP 2020), pp. 38-{45. (2020)
28. Kingma, D., P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd
International Conference on Learning Representations (ICLR 2015), (2015)
29. Code, https://github.com/franciscorodriguez92/exist2021. Last accessed 25 May
2021
30. Transformers interpret library,
https://github.com/cdpierse/transformersinterpret. Last accessed 25 May 2021</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Montes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez-Carmona</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez Mellado</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            <given-names>Adorno</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Jimenez Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Plaza-de-Arco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.M.</given-names>
            ,
            <surname>Taule</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>Proceedings of the iberian languages evaluation forum (iberlef 2021)</article-title>
          . In: CEUR workshop (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Datareportal</surname>
            <given-names>report</given-names>
          </string-name>
          , https://datareportal.com/reports/digital-2020
          <string-name>
            <surname>-</surname>
          </string-name>
          october-globalstatshot.
          <source>Last accessed 25 May 2021</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Swim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hyers</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferguson</surname>
          </string-name>
          , J.: Everyday Sexism:
          <article-title>Evidence for Its Incidence, Nature, and Psychological Impact From Three Daily Diary Studies</article-title>
          .
          <source>Journal of Social Issues</source>
          <volume>57</volume>
          (
          <issue>1</issue>
          ),
          <volume>31</volume>
          {
          <fpage>53</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Berg</surname>
          </string-name>
          , H.:
          <article-title>Everyday Sexism and Posttraumatic Stress Disorder in Women</article-title>
          .
          <source>Violence Against Women</source>
          <volume>12</volume>
          (
          <issue>10</issue>
          ),
          <volume>970</volume>
          {
          <fpage>988</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Amnesty International report, https://www.amnesty.org/en/latest/research/2018/03/onlineviolence-against
          <article-title>-women-chapter-1/</article-title>
          . Last accessed 25 May 2021
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Reuters article, https://www.reuters.com/article/us-facebook
          <source>-women-politicsidUSKCN2522KK. Last accessed 25 May 2021</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Rodr guez-Sanchez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrillo-de-Albornoz</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Automatic Classi cation of Sexism in Social Networks: An Empirical Study on Twitter Data</article-title>
          .
          <source>IEEE Access 8</source>
          ,
          <issue>219563</issue>
          {
          <fpage>219576</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fersini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anzovino</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the Task on Automatic Misogyny Identi cation at IberEval</article-title>
          .
          <source>In: Proceedings of 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          )
          <article-title>co-located with SEPLN</article-title>
          <year>2018</year>
          ), pp.
          <fpage>57</fpage>
          -
          <lpage>{</lpage>
          64.
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Canos</surname>
            , J.,
            <given-names>S.</given-names>
          </string-name>
          ,
          <article-title>Misogyny identi cation through SVM at IberEval 2018</article-title>
          .
          <source>In: Proceedings of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          ), pp.
          <volume>229</volume>
          |
          <fpage>233</fpage>
          . (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Nina-Alcocer</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>AMI at IberEval2018 automatic misogyny identi cation in Spanish and English tweets</article-title>
          .
          <source>In: Proceedings of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          ), pp.
          <volume>274</volume>
          |
          <fpage>279</fpage>
          . (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Frenda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Exploration of misogyny in Spanish and English tweets</article-title>
          .
          <source>In: Proceedings of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          ), pp.
          <volume>260</volume>
          |
          <fpage>267</fpage>
          . (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pamungkas</surname>
            , E.,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Exploiting lexical knowledge for detecting misogyny in English and Spanish tweets</article-title>
          .
          <source>In: Proceedings of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          ), pp.
          <fpage>234</fpage>
          -
          <lpage>{</lpage>
          241. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Goenaga</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atutxa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gojenola</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casillas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ilarraza</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ezeiza</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oronoz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez-</surname>
          </string-name>
          de-Vin~aspre, O.:
          <article-title>Automatic misogyny identi cation using neural networks</article-title>
          .
          <source>In: Proceedings of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          ), pp.
          <fpage>249</fpage>
          -
          <lpage>{</lpage>
          254. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pamungkas</surname>
            , E.,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Misogyny Detection in Twitter: a Multilingual and Cross-</article-title>
          <source>Domain Study</source>
          <volume>57</volume>
          (
          <issue>6</issue>
          ), (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <source>In: Proceedings of the</source>
          <year>2019</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL</article-title>
          <year>2019</year>
          ), pp.
          <volume>4171</volume>
          |
          <fpage>4186</fpage>
          . (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Saha</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paharia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakraborty</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saha</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukherjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>HateAlert@DravidianLangTech-EACL2021: Ensembling strategies for Transformerbased O ensive language Detection</article-title>
          .
          <source>In: Proceedings of the First Workshop on</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>