<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Learning Representations in Automatic Misogyny Identification: What Do We Gain and What Do We Miss?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elisabetta Fersini</string-name>
          <email>elisabetta.fersini@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Rosato</string-name>
          <email>l.rosato1@campus.unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Candelieri</string-name>
          <email>antonio.candelieri@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Archetti</string-name>
          <email>francesco.archetti@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enza Messina</string-name>
          <email>enza.messina@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Milano-Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we address the problem of automatic misogyny identification focusing on understanding the representation capabilities of widely adopted embeddings and addressing the problem of unintended bias. The proposed framework, grounded on Sentence Embeddings and Multi-Objective Bayesian Optimization, has been validated on an Italian dataset. We highlight capabilities and weaknesses related to the use of pre-trained language, as well as the contribution of Bayesian Optimization for mitigating the problem of biased predictions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Nowadays, although women, girls and teenagers
have a strong presence in online social
environments, they are strongly exposed to hateful
comments. In 2021, a survey provided by the Pew
Research Center has shown that females are targeted
for severe types of online gender-based attacks1:
women are more likely than men to report
having been sexually harassed online (16% vs. 5%)
or stalked (13% vs. 9%). These phenomena can
be found under the umbrella of online misogyny,
which can be generally defined as hate, violence
or prejudice against women
        <xref ref-type="bibr" rid="ref1 ref11">(Ging and Siapera,
2018)</xref>
        .
      </p>
    </sec>
    <sec id="sec-2">
      <title>State of the Art</title>
      <p>
        In order to counter online misogyny, several
computational approaches have been presented in the
literature ranging from natural language
processing models to machine learning classifiers,
denoting quite promising recognition performance. The
earliest investigation about computational models
for automatic misogyny identification has been
presented in Anzovino et al. (2018), where the
authors proposed the adoption of several
linguistic cues and baseline classifiers for addressing
three main problems, i.e., misogyny
identification, misogynistic behaviour recognition and
target classification. After this seminal paper,
several approaches have been presented in the
literature distinguishing them according to the feature
representations that have considered for
representing the textual contents and the machine
learning models adopted as classifiers. Most of the
approaches experimented a high-level
representation of the word and/or sentence
        <xref ref-type="bibr" rid="ref13 ref14 ref17 ref7 ref9">(Garc´ıa-D´ıaz
et al., 2021; Pamungkas et al., 2020; Farrell et
al., 2020; Lees et al., 2020)</xref>
        , coupled with
finetuning, while few of them adopted shallow
models or trained deep architectures from scratch
        <xref ref-type="bibr" rid="ref13 ref14 ref14 ref14 ref16 ref16 ref16 ref17 ref17 ref17 ref19 ref19 ref19 ref3 ref3 ref3 ref5 ref5 ref5 ref6 ref7 ref7 ref7 ref8 ref8 ref8">(Fabrizi, 2020; Ou and Li, 2020; da Silva and Roman,
2020; El Abassi and Nisioi, 2020; Koufakou et
al., 2020)</xref>
        .
      </p>
      <p>
        Recently, an increasing interest has been
focused on the problem of unintended bias
        <xref ref-type="bibr" rid="ref4">(Dixon et
al., 2018)</xref>
        . In particular, it is important to focus on
a given error induced by the training data, i.e., the
bias injected in the model by a set of identity terms
that are frequently associated to the misogynous
class. For example, the term women, if frequently
used in misogynous messages, would lead most
of the supervised classification models to
overgeneralization and to disproportionately associate
this identity term to the misogynous label. To this
purpose, only few approaches have been dedicated
to the unintended bias problem for misogyny
identification
        <xref ref-type="bibr" rid="ref10 ref14 ref15 ref19">(Nozza et al., 2019; Lees et al., 2020;
Gencoglu, 2020; Zueva et al., 2020)</xref>
        , denoting a
research panorama that is in its infancy. Although
the above mentioned approaches represent a
fundamental contribution to the problem of automatic
misogyny identification in online social
environments, they do not focus on two main research
questions:
(RQ1) Do embeddings always success when
representing different misogyny related problem such
as the type of misogyny and the target?
(RQ2) Could classification models be constrained
to be less biased by the optimization of their
hyper-parameters, therefore having good
generalization capabilities also on uncommon
expressions?
      </p>
      <p>In this paper, we address the above mentioned
open issues by the following main contributions:
• we perform an analysis of capabilities and
weaknesses of the widely used
state-of-theart sentence encoders USE and BERT when
adopted for misogyny detection;
• we investigate how to reduce the bias of the
models by optimizing their hyper-parameters
through a multi-objective bayesian
optimization strategy.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Framework</title>
      <p>In order to address the above mentioned research
questions, related to the understanding of
weaknesses and capabilities of pre-trained language
models for misogyny identification and the
reduction of unintended bias, we introduce the
framework reported in Figure 1.
3.1</p>
      <p>
        Sentence Embeddings
The proposed approach uses two pre-trained
language models to generate a contextual
representation of the data. The considered models are based
on the transformer architecture initially presented
in Vaswani et al. (2017). More specifically, the
ifrst model is the “small” version of BERT,
uncased, consisting of 12 stacked encoders, 12
parallel self-attention and 768 units to represents text.
The model is pre-trained on 102 languages, has a
dictionary of 110.000 terms and provides a
768dimensional representation of the text as output.
The second model is the multi-language version of
USE trained on 16 languages, which consists of 6
stacked encoders, 8 parallel self-attention and 512
units for the text representation. USE provides a
512-dimensional representation of the text,
computed as the average over the last encoder’s
embeddings of each token. The pre-trained BERT
and USE models have been fine-tuned according
to the available misogyny related labels. In order
to reduce the dimension of the vector
representation given by the fine-tuned pre-trained models
and to introduce sparsity to improve the
separability of the data, an Autoencoder is used as
suggested in
        <xref ref-type="bibr" rid="ref12">(Glorot et al., 2011)</xref>
        . The architecture of
the Autoencoder is reported in Figura 2.
Once the latent representation of a sentence is
obtained by means of the Autoencoder, any
machine learning model could be adopted to
recognize misogynous contents. To this purpose,
Support Vector Machines (SVM) have been used,
searching for their optimal hyper-parameter
settings that are able to ensure the highest
recognition performance. However, searching for
hyperparameters that maximize a specific performance
metric is a computational expensive black-box
optimization process. Due its sample efficiency,
Bayesian Optimization (BO), has been adopted.
BO works sequentially: each classifier’s
hyperparameters to evaluate is chosen by dealing with
the exploitation-exploration dilemma. To do this,
BO relies on two key components: a probabilistic
surrogate model approximating the performance
metric to optimize - depending on SVM
classiifers evaluated so far - and an acquisition function
(utility function suggesting the choice of the next
SVM’s hyper-parameters to evaluate. The
adoption of a probabilistic surrogate model,
specifically a Gaussian Process (GP) in this study, allows
to estimate the expected value of the performance
metric (i.e., GP’s predictive mean) and the
associated uncertainty (i.e., GP’s predictive standard
deviation), for any given SVM’s hyperparaters
conifguration. These two estimates are combined
into the acquisition function, which implements
the exploitation-exploration trade-off mechanism,
where exploitation and exploration are associated
to the surrogate’s predictive mean and standard
deviation, respectively. More formally, let D1:n =
n h(i), υ (i) o be the set of n possible
coni=1,...,n
ifguration, where h(i) is a d-dimensional vector
whose component h(ji) ∈ Hj is the value of the
j-th hyperparameter of the i-th SVM classifier,
and υ (i) is the associated value of the target
performance measure. The overall search space H
is usually a subspace of the cartesian product of
the hyper-parameters’s ranges: H ⊆ H1 × ... ×
Hj × ... × Hd. In this study the search space H is
spanned by d = 2 hyper-parameters whose values
can vary into the following ranges:
• h1 ∈ H1 := [10− 1, 105], that is the
regularization hyperparameter C of the SVM
classiifer (i.e., soft margin SVM)
• h2 ∈ H2 := [10− 5, 101], that is the
hyperparameter γ of the Radial Basis Function
kernel of the SVM classifier (i.e., k(x, x′) =
e− γ ||x− x′||2 )
In this study we consider two different cases (on
stratified 10-fold cross-validation):
• tuning the SVM classifier’s hyper-parameters
to maximize the accuracy, irrespectively to
any measure of bias;
• tuning the SVM classifier’s hyper-parameters
to optimize an objective function aimed at
maximizing accuracy and minimizing a
biasrelated metric.
      </p>
      <p>
        Measuring the Bias In this paper, we measure
the model bias by referring to the specific
definition of unintended bias presented in
        <xref ref-type="bibr" rid="ref4">(Dixon et al.,
2018)</xref>
        :
      </p>
      <p>A model contains unintended bias if it
performs better for comments
containing some particular identity terms than
for comments containing others.</p>
      <p>
        In order to measure the level of unintended bias
of a given model, identity terms (terms related to
the woman concept) and templates (pre-defined
skeleton used to create synthetic samples) are used
to generate sentences referred to women, which
however can be unreasonably classified as
misogynous with high scores. To this purpose,
identity terms and templates available for the AMI at
Evalita 2020 challenge
        <xref ref-type="bibr" rid="ref8">(Fersini et al., 2020)</xref>
        have
been used. Identity terms have been listed using
a set of 37 concepts related to “woman”,
considering both their singular and plural form for the
Italian language. Since unintended bias of identity
terms cannot be measured on the original dataset
set due to class imbalance and highly different
identity term contexts, a synthetic dataset is
generated by means of templates. Following
        <xref ref-type="bibr" rid="ref15">(Nozza
et al., 2019)</xref>
        , we defined several templates that are
iflled out with identity terms and with verbs and
adjectives that are divided into negative (e.g. hate,
inferior) or positive (e.g. love, awesome) forms to
convey misogyny or not. Table 1 reports
examples of templates. The generated synthetic dataset
comprises 3,923 instances, of which 50%
misogynous and 50% non-misogynous, where each
identity term appears in the same contexts.
      </p>
      <p>Template Examples
&lt;identity term&gt;devono essere protette
&lt;identity term&gt;devono essere torturate
adorare &lt;identity term&gt;
umiliare &lt;identity term&gt;
&lt;identity term&gt;stimabile
&lt;identity term&gt;rivoltante
Label
Non-Misogynous
Misogynous
Non-Misogynous
Misogynous
Non-Misogynous
Misogynous</p>
      <p>Identity terms, templates and synthetic dataset
are available at https://github.com/MIN
D-Lab/ItalianBias.</p>
      <p>
        In order to evaluate the performance of the
classification in terms of bias, an AUC-related
measure has been used (AU Cfinal). In what
follow, the higher is the AU Cfinal, the lower is
the bias of the model. In particular, a weighted
combination of AUC estimated on the raw dataset
AU Craw (original tweets) and three per-term
AUC-based scores computed on the synthetic
dataset (AU CSubgroup, AU CBP SN , AU CBNSP )
is adopted
        <xref ref-type="bibr" rid="ref2">(Borkan et al., 2019)</xref>
        . Let s be an
identity-term (e.g. “donna” and “moglie”) and
N be the total number of identity-terms, the
AU Cfinal is defined as:
      </p>
      <p>AU Cfinal
where:
+= 2211NAhUPCraw+</p>
      <p>s AU Csubgroup(s)
+ Ps AU CBP SN (s)
+ Ps AU CBNSP (s)
i
(1)
• AU CSubgroup(s): computes AUC only on
the data within the subgroup containing a
given identity term s. This represents model
understanding and separability within the
subgroup itself. A low value means that the
model does not distinguish properly
misogynous and non-misogynous comments
containing a give identity term s.
• AU CBP SN (s): Background Positive
Subgroup Negative (BPSN) estimates AUC on
the misogynous examples using the
background and the non-misogynous examples
belonging the subgroup. A low value means
that the model mislead non-misogynous
examples that mention the identity-term
with misogynous examples that do not,
likely meaning that the model predicts
higher misogynous scores than it should for
non-misogynous examples mentioning the
identity-term.
• AU CBNSP (s): Background Negative
Subgroup Positive (BNSP) calculates AUC on
the non-misogynous examples from the
background and the misogynous examples from
the subgroup. A low value means that the
model confuses misogynous examples that
mention the identity with non-misogynous
examples that do not, likely meaning that the
model predicts lower misogynous scores than
it should for misogynous examples
mentioning the identity.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Investigation</title>
      <p>
        In this section we report the experimental
investigation performed on the Italian version of
the Automatic Misogyny Detection (AMI) dataset
        <xref ref-type="bibr" rid="ref8">(Fersini et al., 2020)</xref>
        , comparing the results
obtained with the proposed framework with the ones
obtained by the baseline model (i.e. SVM trained
on a TF-IDF representation). The AMI dataset is
composed of 5,000 tweets, labelled according to
“misogyny” (i.e., indicating if a Tweet is
misogynous or not), “misogyny category” (i.e.,
Stereotype&amp;Objectification, Dominance, Derailing,
Sexual Harassment&amp;Threats of Violence, Discredit)
and “target” (i.e., individual or generic).
      </p>
      <p>Regarding the first research question (RQ1), we
tuned the SVM classifier’s hyper-parameters to
maximize only the performance measure related
to each label (i.e. Accuracy for misogyny labels,
F-Measure for category and target labels). First of
all, we reported in Table 2 the results comparing
different models. It can be easily noted that,
although BERT and USE allow SVM to achieve
better performance than TFIDF, there is no difference
between them achieving similar results. Moreover,
while the recognition performance on the
misogyny labels are satisfactory, the capabilities on
discriminating the misogyny category and the target
are still far from being acceptable.</p>
      <p>In order to understand if the low performance
can be due to the embedding, we investigated the
class overlapping originated by USE and BERT.
We report in Figure 3, a 2D representation of the
embeddings obtained by USE (similar results have
been obtained for BERT). We can immediately
highlight that while the embeddings tuned for
recognizing misogyny are quite distinguishable
between misogynous and not misogynous tweets, for
the category and target embeddings there is an
overlapping among the classes. This makes the
learned representations not ready for being used
to recognize the specific form of misogyny and the
subject of misogynous comments.</p>
      <p>Regarding the second research question
(RQ2), we determined the SVM optimal
hyperparameters to maximize both Accuracy and
AU Cfinal (i.e. the bias-related metric). In order
Baseline
(TFIDF + Opt. SVM)</p>
      <p>OUR
(BERT + Opt. SVM)</p>
      <p>OUR
(USE + Opt. SVM)</p>
      <p>Absolute
Improvement
to guarantee the use different set of tokens for the
hyper-parameter search and the inference phase,
the synthetic samples have been split in training
and testing. We compare in Table 3, the AU Cf inal
values estimated on biased and unbiased SVM.
We can easily note that the Unbiased SVM leads
to maintain constant the Accuracy, but improve
the generalization capabilities of the
embeddings given by the AU Cf inal values, denoting a
slightly better results for USE. This means that
the SVM hyper-parameter optimization, with
respect to both performance measures, leads to
promising unbiased models. This ensures ensure
good recognition capabilities on both common
expressions (typically used on Twitter) and on
uncommon comments (synthetic data). The
obtained results also suggest that an SVM trained
using the USE embedding is more keen to adapt
the hyper-parameters to reduce its bias during
training and inference.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>In this paper we have investigated the
capabilities and weaknesses of pre-trained language
models, as well as the problem of the unintended bias
when addressing the automatic misogyny
identification for the Italian language. The proposed
investigation has highlighted that, while pre-trained
embeddings are able to distinguish misogynous
TFIDF
BERT
USE</p>
      <p>Accuracy
AU Cfinal
Accuracy
AU Cfinal
Accuracy
AU Cfinal
and not misogynous comments, they still have
poor discrimination capabilities related to the type
of misogyny and its target. Regarding the
unintended bias problem, it has been shown that an
hyper-parameter search guided by Bayesian
Optimization can lead to debiased models with good
recognition generalization capabilities. As future
work, we will investigate explainable AI
techniques aimed at generating a feature score that is
directly proportional to the feature’s effect on
inducing bias in the prediction model.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Maria</given-names>
            <surname>Anzovino</surname>
          </string-name>
          , Elisabetta Fersini, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Automatic Identification and Classification of Misogynistic Language on Twitter</article-title>
          .
          <source>In International Conference on Applications of Natural Language to Information Systems</source>
          , pages
          <fpage>57</fpage>
          -
          <lpage>64</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Borkan</surname>
          </string-name>
          , Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and
          <string-name>
            <given-names>Lucy</given-names>
            <surname>Vasserman</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification</article-title>
          .
          <source>In Companion Proceedings of The 2019 World Wide Web Conference</source>
          , pages
          <fpage>491</fpage>
          -
          <lpage>500</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Adriano dos S. R. da Silva</surname>
          </string-name>
          and Norton T. Roman.
          <year>2020</year>
          .
          <article-title>No Place For Hate Speech @ AMI: Convolutional Neural Network and Word Embedding for the Identification of Misogyny in Italian</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Lucas</given-names>
            <surname>Dixon</surname>
          </string-name>
          , John Li, Jeffrey Sorensen, Nithum Thain, and
          <string-name>
            <given-names>Lucy</given-names>
            <surname>Vasserman</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Measuring and Mitigating Unintended Bias in Text Classification</article-title>
          .
          <source>In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society</source>
          , pages
          <fpage>67</fpage>
          -
          <lpage>73</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Samer</given-names>
            <surname>El Abassi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sergiu</given-names>
            <surname>Nisioi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>MDD@AMI: Vanilla Classifiers for Misogyny Identification</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Samuel</given-names>
            <surname>Fabrizi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>fabsam @ AMI: a Convolutional Neural Network approach</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Tracie</given-names>
            <surname>Farrell</surname>
          </string-name>
          , Oscar Araque, Miriam Fernandez, and
          <string-name>
            <given-names>Harith</given-names>
            <surname>Alani</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>On the use of Jargon and Word Embeddings to Explore Subculture within the Reddit's Manosphere</article-title>
          .
          <source>In 12th ACM Conference on Web Science</source>
          , pages
          <fpage>221</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Debora Nozza, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>AMI@ EVALITA2020: Automatic Misogyny Identification</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA</source>
          <year>2020</year>
          ). CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          Jose´ Antonio Garc´ıa-D´ıaz, Mar Ca´
          <article-title>novas-Garc´ıa, Ricardo Colomo-Palacios, and Rafael ValenciaGarc´ıa</article-title>
          .
          <year>2021</year>
          .
          <article-title>Detecting Misogyny in Spanish Tweets. An Approach based on Linguistics Features and Word embeddings</article-title>
          .
          <source>Future Generation Computer Systems</source>
          ,
          <volume>114</volume>
          :
          <fpage>506</fpage>
          -
          <lpage>518</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Oguzhan</given-names>
            <surname>Gencoglu</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Cyberbullying detection with fairness constraints</article-title>
          .
          <source>IEEE Internet Computing</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ):
          <fpage>20</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Debbie</given-names>
            <surname>Ging</surname>
          </string-name>
          and
          <string-name>
            <given-names>Eugenia</given-names>
            <surname>Siapera</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Special issue on online misogyny</article-title>
          .
          <source>Feminist Media Studies</source>
          ,
          <volume>18</volume>
          (
          <issue>4</issue>
          ):
          <fpage>515</fpage>
          -
          <lpage>524</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Glorot</surname>
          </string-name>
          , Antoine Bordes, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Deep Sparse Rectifier Neural Networks</article-title>
          .
          <source>In Proceedings of the fourteenth international conference on artificial intelligence and statistics</source>
          , pages
          <fpage>315</fpage>
          -
          <lpage>323</lpage>
          . JMLR Workshop and Conference Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Anna</given-names>
            <surname>Koufakou</surname>
          </string-name>
          , Endang Wahyu Pamungkas, Valerio Basile, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>HurtBERT: Incorporating Lexical Features with BERT for the Detection of Abusive Language</article-title>
          .
          <source>In Proceedings of the fourth workshop on online abuse and harms</source>
          , pages
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Alyssa</given-names>
            <surname>Lees</surname>
          </string-name>
          , Jeffrey Sorensen, and
          <string-name>
            <given-names>Ian</given-names>
            <surname>Kivlichan</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Jigsaw@ AMI and HaSpeeDe2: Fine-Tuning a Pre-Trained Comment-Domain BERT Model</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR. org.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Debora</given-names>
            <surname>Nozza</surname>
          </string-name>
          , Claudia Volpetti, and
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Unintended Bias in Misogyny Detection</article-title>
          .
          <source>In IEEE/WIC/ACM International Conference on Web Intelligence</source>
          , pages
          <fpage>149</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Xiaozhi</given-names>
            <surname>Ou</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hongling</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>YNU OXZ @ HaSpeeDe 2 and AMI : XLM-RoBERTa with Ordered Neurons LSTM for Classification Task at EVALITA 2020</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ), Bologna, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Endang</given-names>
            <surname>Wahyu</surname>
          </string-name>
          <string-name>
            <surname>Pamungkas</surname>
          </string-name>
          , Valerio Basile, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Misogyny detection in twitter: a multilingual and cross-domain study</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>57</volume>
          (
          <issue>6</issue>
          ):
          <fpage>102360</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <string-name>
            <surname>Łukasz Kaiser</surname>
            , and
            <given-names>Illia</given-names>
          </string-name>
          <string-name>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention is all you need</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Nadezhda</given-names>
            <surname>Zueva</surname>
          </string-name>
          , Madina Kabirova, and
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Kalaidin</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Reducing Unintended Identity Bias in Russian Hate Speech Detection</article-title>
          .
          <source>In Proceedings of the Fourth Workshop on Online Abuse and Harms</source>
          , pages
          <fpage>65</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>