<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>QMUL-SDS at EXIST: Leveraging Pre-trained Semantics and Lexical Features for Multilingual Sexism Detection in Social Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aiqi Jiang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arkaitz Zubiaga</string-name>
          <email>a.zubiagag@qmul.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Queen Mary University of London</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Online sexism is an increasing concern for those who experience gender-based abuse in social media platforms as it has a ected the healthy development of the Internet with negative impacts in society. The EXIST shared task proposes the rst task on sEXism Identi cation in Social neTworks (EXIST) at IberLEF 2021 [30]. It provides a benchmark sexism dataset with Twitter and Gab posts in both English and Spanish, along with a task articulated in two subtasks consisting in sexism detection at di erent levels of granularity: Subtask 1 Sexism Identi cation is a classical binary classi cation task to determine whether a given text is sexist or not, while Subtask 2 Sexism Categorisation is a ner-grained classi cation task focused on distinguishing di erent types of sexism. In this paper, we describe the participation of the QMUL-SDS team in EXIST. We propose an architecture made of the last 4 hidden states of XLM-RoBERTa and a TextCNN with 3 kernels. Our model also exploits lexical features relying on the use of new and existing lexicons of abusive words, with a special focus on sexist slurs and abusive words targeting women. Our team ranked 11th in Subtask 1 and 4th in Subtask 2 among all the teams on the leaderboard, clearly outperforming the baselines o ered by EXIST.</p>
      </abstract>
      <kwd-group>
        <kwd>Sexism Identi cation</kwd>
        <kwd>Hate Speech Detection</kwd>
        <kwd>Abusive Language Detection</kwd>
        <kwd>Multilingual Text Classi cation</kwd>
        <kwd>Social Network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Along with an unprecedented ability for communication and information sharing,
social media platforms provide an anonymous environment which allows users
to take aggressive attitudes towards speci c groups or individuals by posting
abusive language. This leads to increased occurrences of incidents, hostile
behaviours and remarks of harassment [
        <xref ref-type="bibr" rid="ref10 ref11 ref32 ref4">32,10,11,4</xref>
        ]. Abusive language is one of the
most important conceptual categories in anti-oppression politics today [
        <xref ref-type="bibr" rid="ref14 ref32">14,32</xref>
        ].
      </p>
      <p>
        Gender-based speech is a common type of abusive language online which
disparages an individual or group on the basis of their gender, currently considered as
a deteriorating factor in social networks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        In the recent years, due to the increasing amount of user-generated content
and the diversity of user behaviour towards women in social media, manual
inspection and moderation of gender-related contents becomes unmanageable.
The academic community has seen a rapid increase in research tackling the
automatic detection of hateful behaviour towards women in both monolingual and
multilingual scenarios, spreading across various social media platforms (such as
Facebook and Twitter) [
        <xref ref-type="bibr" rid="ref22">22,37</xref>
        ]. The rst attempt is made by Hewitt et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
who explores the manual classi cation of English misogynous tweets, and the
rst survey of automatic misogyny identi cation in social media is conducted by
Anzovino et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Chowdhury et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] aggregate experiences of sexual abuse
to facilitate a better understanding of social media construction and Nozza et
al. [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] attempt to measure and mitigate unintended bias in machine learning
models for misogyny detection. An extensive of misogyny detection is then
conducted especially in multilingual and cross-domain scenarios [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. Since 2018,
from the perspective of machine learning and computational linguistics, many
international evaluation campaigns have been organised to identify online cases
of multilingual abusive language against women, such as AMI@Evalita 2018 in
English and Italian [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], AMI@IberEval 2018 in English and Spanish [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
HatEval@SemEval 2019 in English and Spanish [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], AMI@Evalita 2020 in Italian [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
and ArMI@HASOC 2021 [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] in Arabic.
      </p>
      <p>
        However, most previous studies have concentrated on detecting misogynous
behaviour online [
        <xref ref-type="bibr" rid="ref1 ref13 ref34">41,1,13,34</xref>
        ], while misogynous behaviour is not always
equivalent to sexism. Misogyny frequently implies a hostile attitude with obvious
hatred against women [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. As for sexism, Glick and Fiske [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] de ne two forms of
sexism: hostile sexism and benevolent sexism. Hostile sexism is characterised by
an explicitly negative attitude towards women, while benevolent sexism is more
subtle with seemingly positive characteristics. Sexism includes a wide range of
behaviours (such as stereotyping, ideological issues, sexual violence, etc.) [
        <xref ref-type="bibr" rid="ref1 ref29">29,1</xref>
        ],
and may be expressed in di erent ways: direct, indirect, descriptive or reported
[
        <xref ref-type="bibr" rid="ref19 ref5">19,5</xref>
        ]. Thus, misogyny is only one case of sexism [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. Hence, given subtle or
implicit expressions of sexism, dealing with the detection of sexism in a wide
spectrum of sexist attitudes and behaviours is necessary as these are, in fact, the most
frequent and dangerous for society [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ]. The purpose of the EXIST@IberLEF
2021 shared task [
        <xref ref-type="bibr" rid="ref30">38,30</xref>
        ] is to consider sexist behaviour in a broad sense, from
explicit misogyny to other subtle behaviours involving implicit sexism. The
EXIST dataset contains various types of sexist expressions and related phenomena,
including descriptive or reported assertions, where a sexist post is a report or
description of sexist behaviour.
      </p>
      <p>
        More recently, general pre-trained language models (PLM) have shown their
capacity to improve the performance of NLP systems for most tasks on
canonical data. Among the recent work for multilingual PLMs, multilingual BERT
(BERT) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and cross-lingual language model (XLM) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] have stood out, thanks
to the e ectiveness of pre-training large transformers on multiple languages at
once in the eld of cross-lingual understanding [39]. However, due to the limited
availability of training corpora, XLM-RoBERTa model (XLM-R) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] has become
the new state-of-the-art (SOTA) multilingual PLM by extending the amount
of training data and enlarging the length of sentences. These SOTA PLMs are
usually ne-tuned to some downstream classi cation tasks, such as multilingual
sexism detection [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ], whereas few of them consider to induct external knowledge
in a multilingual scenario into the model, such as linguistic information from a
domain-speci c lexicon.
      </p>
      <p>
        Inspired by the work in [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], in this paper we propose a novel approach
(XRCNN-Ex) by combining XLM-R [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] with a TextCNN [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and infusing
External lexical knowledge from HurtLex [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to handle two subtasks of EXIST.
Given the scarcity of semantic information in the commonly-used pooler
output of XLM-R, XRCNN-Ex aggregates the last 4 hidden states of XLM-R to
obtain the representations with ampler semantic features. Then we construct a
TextCNN with 3 di erent kernels to capture various local features from XLM-R,
which decreases the memory cost with a smaller number of parameters and
proceeds a faster training speed with lower computation compared to those
RNNbased models. Additionally, external knowledge from the domain-speci c lexicon
HurtLex is fed into the structure of XRCNN in order to investigate the e
ectiveness of lexical information on the performance. In our experimental and o cial
results, the basic architecture XRCNN in our proposed model presents a notable
achievement, while the performance of XRCNN-Ex is comparatively unstable
and inferior in the nal submission. We discuss this case in Section 5. When it
comes to the team ranking, we ranked 11th in subtask 1 sexism identi cation
and 4th in subtask 2 sexism categorisation. In submission ranking, we ranked
14th (accuracy score of 0.761) and 5th (macro f1 score of 0.559) respectively.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>EXIST: Task and Data Description</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Task Description</title>
        <p>The organisers of EXIST proposed a shared task on automatic detection of
multilingual sexist content on Twitter and Gab, including content in English
(EN) and Spanish (ES). Two di erent subtasks were proposed:
{ Subtask 1 - Sexism Identi cation: A binary classi cation task, where
every system has to determine whether a given text (tweet or gab) is sexist
or not sexist, where sexist content is de ned as that which \is sexist itself,
describes a sexist situation or criticises a sexist behaviour."
{ Subtask 2 - Sexism Categorisation: Aiming to classify the sexist texts
according to ve categories of sexist behaviour including: \ideological and
inequality", \stereotype and dominance", \objecti cation", \sexual violence"
and \misogyny and non-sexual violence".</p>
        <p>Predictions should be made on a mixed test set including content in both
languages. Subtask 1 is evaluated in terms of accuracy, while Subtask 2 is
evaluated using a macro-F1 score. Each participating team could submit a maximum
of 3 runs.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Data Description</title>
        <p>The EXIST dataset, provided by organisers, consists of 6,977 tweets for
training and 3,386 tweets for testing, both of which include content in English and
Spanish, and are manually labeled by crowdsourced annotators. In addition, the
test set also includes 982 \gabs" from the uncensored social network Gab.com in
order to measure the di erence between social networks with and without
\content control", Twitter and Gab.com respectively. Table 1 shows more details of
the datasets provided.
In this section, we introduce our proposed model XRCNN-Ex and experimental
settings. Figure 1 shows the overall framework of the system we submitted to
handle the two EXIST subtasks, which uses the pre-trained multilingual model
XLM-R with the text-based Convolution Neural Network (TextCNN) and lexical
features. We rst obtain multilingual semantic information from the hidden state
(the last 4 hidden layers) of XLM-R, and then concatenate them together as the
input to TextCNN for further feature extraction. External domain knowledge in
the lexicon is incorporated into the basic structure of XRCNN and merged with
the output of TextCNN. Finally, we pass the merged output features through a
dense layer and utilise a softmax function for the nal classi cation.
hidden layer 12
hidden layer 11
hidden layer 10
hidden layer 9</p>
        <p>...</p>
        <p>hidden layer 1
XLM-RoBERTa-base
token embeddings</p>
        <p>XLM-R tokeniser
input dataset</p>
        <p>last 4
hidden layers
(128,768x4)</p>
        <p>Conv1D
(KS=4,768x4)</p>
        <p>Conv1D
(KS=5,768x4)
num_filter=128
Max
Pool
Max</p>
        <p>Pool</p>
        <p>TextCNN
HurtLex
softmax</p>
        <p>
          output
merged
(1,128x3) dense
lexical features
(1,17)
Previous work with multilingual masked language models (MLM) has proved
the e ectiveness of pre-training large transformer models on multi-language
corpora at once in the domain of cross-lingual understanding [39], such as
multilingual BERT (BERT) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and cross-lingual language model (XLM) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. These
models have substantiated their superiority over supervised learning models in
many NLP tasks, especially in cases with limited training data. However, both
mBERT and XLM are pre-trained on Wikipedia, leading to a relatively limited
scale speci cally for languages with poor resources. The XLM-RoBERTa model
(XLM-R) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] has extended the way of pre-training MLM by scaling the amount
of data by two orders of magnitude (from Wikipedia to Common Crawl) and
training on longer sequences (similar to RoBERTa [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]). It has been trained
in more than 100 languages, leading to signi cant improvements on the
performance of cross-lingual transfer tasks. In this work, we utilise XLM-R to address
the multilingual EXIST dataset and extract semantic features of the whole text
to deepen the understanding of the sentence and reduce the impact of noise.
        </p>
        <p>
          The rst token of the sequence in the last hidden layer of XLM-R is commonly
used as the output for the classi cation task, while this output is usually not
able to summarise abundant semantic information of the input sentence. Recent
work by [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] indicates that richer semantic features can be learned by several
hidden layers on top of BERT. In our system, we assume that some top hidden
layers of XLM-R are also able to capture semantic information due to the similar
architecture of XLM-R and BERT. Thus, we propose the model XRCNN-Ex as
shown in Figure 1 for this task. Firstly, the input is processed by the XLM-R
tokeniser and fed into the XLM-R model to get a list of hidden states. Then we
gain deeper semantic features by integrating the last 4 hidden layers of XLM-R
and feed it into TextCNN. The shape of the output is n (d 4), where n is the
length of the input sentence, and d is the dimension of each token in one hidden
layer.
3.2
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>TextCNN</title>
        <p>
          A text-based Convolutional Neural Network (TextCNN) is a popular
architecture for dealing with NLP tasks with a good feature extraction capability [
          <xref ref-type="bibr" rid="ref24">24,43</xref>
          ].
The network structure of TextCNN is a variant of the simple CNN model. It is
comparatively simpler than other neural networks and is able to reduce the
number of dimensions of the input features, resulting in a smaller number of
parameters, lower computational needs, and a faster training speed [43]. TextCNN
utilises several sliding convolution lters to capture local textual features [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>In our system, we use multiple 1D convolution kernels at a time for the
convolution operation over the output of last 4 hidden states from XLM-R.
The output feature set is X = [x1; x2; x3; :::; xn] 2 Rn (d 4). Let the window
xi:i+j 1 = [xi; xi+1; :::; xi+j 1] refer to the concatenation of j words. A lter
w 2 Rj (d 4) is involved in the convolution process, applied to the window
xi:i+j 1 of j words to generate a new feature ci:
ci = f (w xi:i+j 1 + b)
(1)
where f is a non-linear function such as ReLU and b 2 R(d 4) is the bias.
After the lter w slides across [x1:j; x2:j+1; :::; xn j+1:n], a feature map is
generated:</p>
        <p>C = [c1; c2; :::cn j+1] 2 R(n j+1)
(2)</p>
        <p>
          Then we apply the global max-pooling operation over the feature map C and
take the maximum value c^ = maxfCg to capture the most important feature
for each feature map [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Features extracted by multiple lters are merged and
fed into a dense layer.
3.3
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Lexical Feature Induction</title>
        <p>
          Currently, language models based on the transformer architecture have been
popular among many NLP tasks in both monolingual and multilingual scenarios.
But one of the drawbacks is that these models do not take any additional domain
knowledge into consideration, like linguistic information from the domain-speci c
lexicon [42]. Bassignana et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] introduce HurtLex, a multilingual lexicon
containing o ensive, aggressive, and hateful words and phrases in over 50 languages
and spanning 17 categories [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The work by Koufakou et al. [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] incorporated
lexical features based on the word categories derived from HurtLex to boost the
performance of monolingual BERT in such hate-related tasks, whereas there is
no relevant study for the multilingual sexism scenario.
        </p>
        <p>
          Given the scarcity of sexism-speci c lexicons as well as the strong relation
between those phenomena of o ensive language and sexist language [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], we
employ HurtLex for the induction of external lexical information to explore how
the external lexical features a ect the sexism detection performance. We extract
8,228 words for English and 5,006 for Spanish from HurtLex version 1.2, and
construct multilingual lexical representations based on the HurtLex categories
in both languages. There are 17 diverse categories, described with the number
of terms in each language in Table 2. More speci cally, we rst generate a
17dimensional lexical vector to count the frequency of each category. For instance,
if a text includes 2 words in the category of derogatory words (CDS), the
corresponding element of CDS in the lexical vector is supposed to be 2. Then we
convert the lexical vector from the count frequency to term frequency{inverse
document frequency (TF-IDF) [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], indicating how signi cant a category is to
a text in the corpus. Finally, we concatenate the TF-IDF lexical vector with
merged output of the TextCNN, and put it into the dense layer.
In order to prevent the model from over- tting, we add the dropout after the
dense layer, then using a softmax function to obtain the label probability as the
nal output of the model.
3.5
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Experimental Setting</title>
        <p>Training Set Split: We use strati ed sampling (Strati edShu eSplit) in the
scikit-learn Python package for the cross-validation step instead of ordinary
kfold cross-validation to evaluate the model. Strati ed Shu e Split is able to
create splits by preserving the same percentage for each target class as in the
original training set. We set the number of splits to 5 and the ratio of training
set to validation set to be 9 to 1. For the EXIST training set, this led to a
randomly sampled training set (6,279) and validation set (698). We present all
performance scores in Section 4 based on the rst split of training and validation
sets.</p>
        <p>Text Preprocessing: Since texts are obtained from Twitter and Gab, a
preprocessing step is needed to maximise the features that can be extracted and
to gain a unique and meaningful sequence of words, including removing
nonalphabetic words, consecutive white spaces, and lowercasing all texts. As for
special tokens in Twitter and Gab, we tokenise hashtags into separate words
using the wordsegment Python package, for example: #HashtagContent becomes
Hashtag Content. URLs are replaced with the meta-token &lt;URL&gt; and user
names are replaced with &lt;USERNAME&gt;. The text is subsequently tokenised
using the corresponding XLM-R pre-trained tokeniser for both languages.
Model Parameter Setting: The parameters in each part of XRCNN-Ex are
shown below:
{ XLM-R: we use XLM-RoBERTa-base pre-trained model, consisting of 12
hidden layers. We set the output hidden states in XLM-R con g le to True
in order to obtain di erent hidden states.
{ TextCNN: we set the number of lters to 128 and three kernel sizes of 3,
4, and 5. ReLU is the non-linear function used for convolution operation.
{ Dense layer: we set the number of units to 768.</p>
        <p>Training Process: During our training process, we use sparse categorical cross
entropy as the loss function to save time in memory and computation. We use
the Adam optimiser with a learning rate of 1e 5. We set the max sequence length
to 128 and the dropout rate to 0.4. The model is trained in 7 epochs with the
batch size of 32. All implementations are under the environment of Keras 2.5.0
and Tensor ow 2.5.0 with python 3.7. The evaluation metrics are accuracy score
and macro-averaged f1 score for both two subtasks.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <p>In this section, we report our results in the two subtasks of the EXIST
competition. We rst conduct comparative experiments to delve into the optimal way
of consolidating features from the hidden state of XLM-R, and then perform an
ablation study of the whole architecture of XRCNN-Ex to probe the
contribution of its di erent components. All results are evaluated on the training and
validation sets from the rst split of original training data released by the
EXIST. The o cial results in the EXIST shared task are presented and discussed
nally.
4.1</p>
      <sec id="sec-3-1">
        <title>Comparative Experiments for XLM-R Outputs</title>
        <p>
          The pooler output is commonly utilised as the output of pre-trained language
models to address the classi cation task, which is generally lacking in su cient
and e ective semantic information in the sentence representation [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. More
semantic features can be explored from di erent hidden states of models.
        </p>
        <p>
          In our experiments, we consider both pooler output and hidden state as the
outputs of XLM-R, as well as investigate the consequence of diverse aggregations
of several hidden layers. These experiments are implemented on the basic model
structure XRCNN and results are displayed in Table 3. It can be observed that
integrating the last 4 hidden states of XLM-R yields better performance than
other outputs on both subtasks, showing a notable increase in comparison with
the pooler output. To be more precise, the model with only the pooler output
performs better than the one combining the last 2 hidden layers in subtask 1
and the one with the last hidden layer in subtask 2. Nevertheless, it does not
outperform the model absorbed in more than 2 hidden layers, which designates
the constraint of the pooler output as the output features and the bene t of
abundant semantic information in the hidden layer of XLM-R infused in our
model.
Our proposed model XRCNN-Ex combines the last 4 hidden states of
XLMR and the TextCNN with 3 kernels, then inducting extra lexical information.
Several ablative experiments are implemented by removing certain components
of XRCNN-Ex to understand the contribution of each component. The following
models are applied in this step:
{ XLM-R Last 4 Hidden Layers: we aggregate the last 4 hidden states of
XLM-R as the sentence representations of the input and put them into a
simple linear classi er.
{ FastText + TextCNN: we use the Fasttext embeddings trained on
Common Crawl and Wikipedia in 157 languages [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] to convert the input data
into word embeddings, and then feed them into a TextCNN.
{ XRCNN: basic architecture of our proposed model.
{ XRCNN-Ex: our proposed model incorporating lexical embeddings.
        </p>
        <p>Results of the ablation study are reported in Table 4. We can see that
XRCNN and XRCNN-Ex both achieve competitive performance, with noticeable
improvements over the other two ablative models XLM-R Last 4 Hidden Layers
and FastText+TextCNN. Moreover, XRCNN-Ex achieves a slight improvement
in subtask 1 but it does not outperform XRCNN in subtask 2, which casts some
doubt on the impact of extra lexical embeddings. We further discuss this in the
Section 5.</p>
        <p>cial Results in the EXIST Shared Task
Our results show that the inclusion of the hidden state of XLM-R and TextCNN
e ectively improves the model quality of identifying sexist content, which is the
most signi cant contribution of this work. However, results on the test set for
XRCNN model with lexical features demonstrate that the choice of lexicon words
needs to be done more carefully, as they can lead to harming performance as
is the case of XRCNN-Ex in the nal scores. We foresee the need to further
investigate the following variations to assess their impact on the performance:</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we describe the participation of the QMUL-SDS team in the
EXIST shared task on multilingual sexism identi cation in English and
Spanish social media. As part of our submission, we propose a novel system called
XRCNN-Ex. Our submission for binary sexism identi cation subtask achieves
an accuracy score of 0.761 on the test set, ranking 14th among submissions and
11th among teams. For the ner-grained sexism categorisation (subtask 2), we
achieve a macro-averaged F1 score of 0.559, ranking 5th and 4th respectively
among submissions and teams.</p>
      <p>Our basic architecture XRCNN (and XRCNN-Ex), instead of only using the
pooler output as the XLM-R's output to deal with the classi cation task,
incorporates the last 4 hidden layers of XLM-R to gain deeper and richer semantic
representations, which is fed into a faster classi er TextCNN. Results in both
validation and test sets indicate the e ectiveness of using multiple hidden states
with enriched semantic information and the capability of the TextCNN
classi er on top of XLM-R. In addition, we delve into the impact of integrating
hate-related lexical embeddings into the system XRCNN-Ex. The results in the
validation set show that XRCNN-Ex has a positive in uence on subtask 1, while
nal results in the test set present an inferior performance on both subtasks. We
aim to investigate further how to best leverage lexical information.
7</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>Aiqi Jiang is funded by China Scholarship Council (CSC). This research utilised
Queen Mary's Apocrita HPC facility, supported by QMUL IT service.
37. Rodr guez-Sanchez, F., Carrillo-de Albornoz, J., Plaza, L.: Automatic classi cation
of sexism in social networks: An empirical study on twitter data. IEEE Access 8,
219563{219576 (2020)
38. Rodriguez-Sanchez, F., de Albornoz, J.C., Plaza, L., Gonzalo, J., Rosso, P., Comet,
M., Donoso, T.: Overview of exist 2021: sexism identi cation in social networks.</p>
      <p>Procesamiento del Lenguaje Natural 67(0) (2021)
39. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L.u., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information
Processing Systems. vol. 30. Curran Associates, Inc. (2017)
40. Vidgen, B., Harris, A., Nguyen, D., Tromble, R., Hale, S., Margetts, H.: Challenges
and frontiers in abusive content detection. In: Proceedings of the Third Workshop
on Abusive Language Online. pp. 80{93. Association for Computational
Linguistics, Florence, Italy (2019)
41. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for
hate speech detection on twitter. In: Proceedings of the NAACL student research
workshop. pp. 88{93 (2016)
42. Wiegand, M., Ruppenhofer, J., Schmidt, A., Greenberg, C.: Inducing a lexicon of
abusive words{a feature-based approach (2018)
43. Zhang, T., You, F.: Research on short text classi cation based on textcnn. In:
Journal of Physics: Conference Series. vol. 1757, p. 012092. IOP Publishing (2021)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Anzovino</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fersini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Automatic identi cation and classi cation of misogynistic language on twitter</article-title>
          .
          <source>In: International Conference on Applications of Natural Language to Information Systems</source>
          . pp.
          <volume>57</volume>
          {
          <fpage>64</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fersini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pardo</surname>
            ,
            <given-names>F.M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanguinetti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>54</volume>
          {
          <issue>63</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bassignana</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Hurtlex: A multilingual lexicon of words to hurt</article-title>
          .
          <source>In: 5th Italian Conference on Computational Linguistics</source>
          , CLiC-it
          <year>2018</year>
          . vol.
          <volume>2253</volume>
          , pp.
          <volume>1</volume>
          {
          <issue>6</issue>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chiril</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moriceau</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benamara</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Origgi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coulomb-Gully</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>An annotated corpus for sexism detection in French tweets</article-title>
          .
          <source>In: Proceedings of the 12th Language Resources and Evaluation Conference</source>
          . pp.
          <volume>1397</volume>
          {
          <fpage>1403</fpage>
          .
          <string-name>
            <surname>European Language Resources Association</surname>
          </string-name>
          , Marseille, France (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chiril</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moriceau</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benamara</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Origgi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coulomb-Gully</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>He said \who's gonna take care of your children when you are at acl?": Reported sexist acts are not sexist</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <volume>4055</volume>
          {
          <fpage>4066</fpage>
          .
          <string-name>
            <surname>Online</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Collobert</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuksa</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <issue>76</issue>
          ),
          <volume>2493</volume>
          {
          <fpage>2537</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khandelwal</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaudhary</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wenzek</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guzman</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <volume>8440</volume>
          {
          <fpage>8451</fpage>
          . Association for Computational Linguistics,
          <string-name>
            <surname>Online</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lample</surname>
          </string-name>
          , G.:
          <article-title>Cross-lingual language model pretraining</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . vol.
          <volume>32</volume>
          . Curran Associates, Inc. (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <fpage>4186</fpage>
          . Association for Computational Linguistics, Minneapolis, Minnesota (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fersini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the evalita 2018 task on automatic misogyny identi cation (ami)</article-title>
          . In: EVALITA@
          <string-name>
            <surname>CLiC-it</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Fersini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Ami@ evalita2020:
          <article-title>Automatic misogyny identi - cation. Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR. org (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Fersini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anzovino</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of the task on automatic misogyny identi cation at ibereval 2018</article-title>
          . In: IberEval@ SEPLN. pp.
          <volume>214</volume>
          {
          <issue>228</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Frenda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-y Gomez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Online hate speech against women: Automatic identi cation of misogyny and sexism on twitter</article-title>
          .
          <source>Journal of Intelligent &amp; Fuzzy Systems</source>
          <volume>36</volume>
          (
          <issue>5</issue>
          ),
          <volume>4743</volume>
          {
          <fpage>4752</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Gagliardone</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
          </string-name>
          , G.:
          <article-title>Countering online hate speech</article-title>
          .
          <source>Unesco Publishing</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Ghosh</given-names>
            <surname>Chowdhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Sawhney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.R.</given-names>
            ,
            <surname>Mahata</surname>
          </string-name>
          , D.:
          <article-title>#YouToo? detection of personal recollections of sexual harassment on social media</article-title>
          .
          <source>In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <volume>2527</volume>
          {
          <fpage>2537</fpage>
          . Association for Computational Linguistics, Florence, Italy (
          <year>2019</year>
          ). https://doi.org/10.18653/v1/
          <fpage>P19</fpage>
          -1241, https://www.aclweb.org/ anthology/P19-1241
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Glick</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fiske</surname>
          </string-name>
          , S.T.:
          <article-title>Ambivalent sexism</article-title>
          .
          <source>In: Advances in experimental social psychology</source>
          , vol.
          <volume>33</volume>
          , pp.
          <volume>115</volume>
          {
          <fpage>188</fpage>
          .
          <string-name>
            <surname>Elsevier</surname>
          </string-name>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.:</given-names>
          </string-name>
          <article-title>word2vec explained: deriving mikolov et al.'s negativesampling word-embedding method</article-title>
          .
          <source>arXiv preprint arXiv:1402.3722</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Learning word vectors for 157 languages</article-title>
          .
          <source>In: Proceedings of the International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          )
          <article-title>(</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Hellinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pauwels</surname>
          </string-name>
          , A.:
          <volume>21</volume>
          .
          <article-title>language and sexism</article-title>
          . In:
          <article-title>Handbook of language and communication: Diversity and change</article-title>
          , pp.
          <volume>651</volume>
          {
          <fpage>684</fpage>
          .
          <string-name>
            <surname>De Gruyter Mouton</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Hewitt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tiropanis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bokhove</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The problem of identifying misogynist language on twitter (and other online social spaces)</article-title>
          .
          <source>In: Proceedings of the 8th ACM Conference on Web Science</source>
          . pp.
          <volume>333</volume>
          {
          <issue>335</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Jawahar</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagot</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seddah</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>What does BERT learn about the structure of language? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</article-title>
          . pp.
          <volume>3651</volume>
          {
          <fpage>3657</fpage>
          . Association for Computational Linguistics, Florence, Italy (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Jha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mamidi</surname>
          </string-name>
          , R.:
          <article-title>When does a compliment become sexist? analysis and classi cation of ambivalent sexism using twitter data</article-title>
          .
          <source>In: Proceedings of the second workshop on NLP and computational social science</source>
          . pp.
          <volume>7</volume>
          {
          <issue>16</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Jing</surname>
            ,
            <given-names>L.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>H.B.</given-names>
          </string-name>
          :
          <article-title>Improved feature selection approach t df in text mining</article-title>
          .
          <source>In: Proceedings. International Conference on Machine Learning and Cybernetics</source>
          . vol.
          <volume>2</volume>
          , pp.
          <volume>944</volume>
          {
          <fpage>946</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classi cation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1746</volume>
          {
          <fpage>1751</fpage>
          . Association for Computational Linguistics, Doha, Qatar (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Koufakou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pamungkas</surname>
            ,
            <given-names>E.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
          </string-name>
          , V.:
          <article-title>HurtBERT: Incorporating lexical features with BERT for the detection of abusive language</article-title>
          .
          <source>In: Proceedings of the Fourth Workshop on Online Abuse and Harms</source>
          . pp.
          <volume>34</volume>
          {
          <fpage>43</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.:</surname>
          </string-name>
          <article-title>The automatic text classi cation method based on bert and feature union</article-title>
          .
          <source>In: 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS)</source>
          . pp.
          <volume>774</volume>
          {
          <fpage>777</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Label-attentive hierarchical attention network for text classi cation</article-title>
          .
          <source>In: Proceedings of the 2020 5th International Conference on Big Data and Computing</source>
          . pp.
          <volume>90</volume>
          {
          <issue>96</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Manne</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Down girl: The logic of misogyny</article-title>
          . Oxford University Press (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Montes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez-Carmona</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez-Mellado</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            <given-names>Adorno</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Jimenez Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Plaza-de Arco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.M.</given-names>
            ,
            <surname>Taule</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2021</year>
          ),
          <source>CEUR Workshop Proceedings</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Mulki</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The rst arabic misogyny identi cation shared task as a subtrack of hasoc @ re2021</article-title>
          . https://sites.google.com/view/armi2021/ (
          <year>2021</year>
          ), [Online; accessed 01-06-2021]
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Nobata</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tetreault</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehdad</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Abusive language detection in online user content</article-title>
          .
          <source>In: Proceedings of the 25th International Conference on World Wide Web</source>
          . pp.
          <volume>145</volume>
          {
          <fpage>153</fpage>
          .
          <string-name>
            <surname>International World Wide Web Conferences Steering Committee</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Volpetti</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fersini</surname>
          </string-name>
          , E.:
          <article-title>Unintended bias in misogyny detection</article-title>
          .
          <source>In: IEEE/WIC/ACM International Conference on Web Intelligence</source>
          . p.
          <volume>149</volume>
          {
          <fpage>155</fpage>
          . WI '
          <volume>19</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2019</year>
          ). https://doi.org/10.1145/3350546.3352512, https://doi.org/10. 1145/3350546.3352512
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Pamungkas</surname>
            ,
            <given-names>E.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Misogyny detection in twitter: a multilingual and cross-domain study</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>57</volume>
          (
          <issue>6</issue>
          ),
          <volume>102360</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Pappas</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henderson</surname>
          </string-name>
          , J.:
          <article-title>Gile: A generalized input-label embedding for text classi cation</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>7</volume>
          (
          <issue>0</issue>
          ),
          <volume>139</volume>
          {
          <fpage>155</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Richardson-Self</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Woman-hating: On misogyny, sexism, and hate speech</article-title>
          .
          <source>Hypatia</source>
          <volume>33</volume>
          (
          <issue>2</issue>
          ),
          <volume>256</volume>
          {
          <fpage>272</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>