<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Conference and Labs of the Evaluation Forum, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Multilingual Sexism Identification Using Contrastive Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jason Angel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Segun Taofeek Aroyehun</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Gelbukh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC)</institution>
          ,
          <addr-line>Mexico City</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Konstanz</institution>
          ,
          <addr-line>Konstanz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>We present our systems and findings for the Exist2023 (subtask 1), a shared task for multilingual sexism identification at CLEF 2023 [ 1]. Our system aims to accurately identify and evaluate the degree of sexism in social media content in a multilingual setting considering its subjective nature. We successfully integrated two variations of contrastive learning as an intermediate step in a conventional fine-tuning language model pipeline. Our approach not only outperformed the sole fine-tuned method but also achieved competitive results compared to the top scores in the competition. This substantiates the simplicity and benefits of our approach to the task of sexism identification.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sexism identification</kwd>
        <kwd>contrastive learning</kwd>
        <kwd>learning with disagreement</kwd>
        <kwd>multilingual natural language processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Sexism is a form of discrimination rooted in biased beliefs, stereotypes, and the oppression
of individuals, often targeting women due to their sex/gender. In today’s era, where social
networks wield significant influence, it is vital to acknowledge and combat sexism. This harmful
mindset perpetuates inequality, limits opportunities, and reinforces oppressive power dynamics,
hindering progress toward a fairer society.</p>
      <p>Nevertheless, the automatic and reliable identification of sexist statements poses significant
challenges due to their subjective nature. This research aims to propose an approach to
identifying sexism taking into account varying opinions on whether a message can be considered
sexist or not. We conducted experiments using a multilingual language model on Spanish and
English messages and explored two variations of incorporating contrastive learning in a typical
NLP pipeline in order to cluster the “degree of sexism” present in a message.</p>
      <p>The document continues as follows: Section 2 outlines the distinctive characteristics of the
Exist2023 dataset, enabling the analysis of how perceptions of sexism can be influenced by
gender and age groups across two distinct languages. In Section 3, we ofer a detailed description
of our experimental approach. Section 4 summarizes our results and ofers an interpretation of
our findings. Section 5 provides a concise overview of related research on sexism identification.
Lastly, in Section 6 we conclude by highlighting our contributions and outlining avenues for
future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Sexism Identification Dataset</title>
      <p>The dataset provided by the Exist2023 initiative consists of nearly 10K tweets (around 5.3K for
Spanish and 4.7K for English) carefully selected (to mitigate terminology, temporal and author
biases) from more than 8M tweets from 1st September 2021 till the 30th September 2022. The
dataset was roughly split into train, dev, and test roughly distributed as 70%, 10%, and 20%
respectively for both languages.</p>
      <p>The labels for the tweets in Subtask 1 were categorized as "YES" or "NO" to indicate whether
they conveyed a sexist meaning. What sets this dataset apart is its thoughtful consideration of
the subjectivity inherent in identifying sexism. To accommodate this, the dataset follows the
learning with disagreements paradigm, where multiple annotators (six in this case) ofer diverse
perspectives. Furthermore, to address potential "label bias" resulting from socio-demographic
diferences among annotators, each annotator represents a unique socio-demographic profile,
including gender (MALE, FEMALE) and age group (18-22, 23-45, and 46+).</p>
      <p>Although there are no gold annotations, the majority vote from the label annotations suggests
the proportion of sexist content existing in the dataset, which is further used for evaluation
purposes as a "hard label". Table 1 combines samples from train and dev split to showcase
the distribution of labels per language in terms of majority votes as Sexist, not-sexist, and
undetermined i.e, when three annotators consider the tweet sexist and the other three as
non-sexist</p>
    </sec>
    <sec id="sec-3">
      <title>3. System description</title>
      <p>
        We fine-tune Bernice [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] a multilingual RoBERTa language model that specializes in processing
language from the Twitter domain, which allowed us to handle two important aspects of the
Exist2023 dataset: the presence of English and Spanish samples, and the particularities of the
informal language used in social networks such as Twitter, including the processing of emojis
and hashtags.
      </p>
      <p>Our experiments diferentiate because of the addition of the contrastive learning technique
to the typical fine-tuning language model pipeline which learns an embedding space in which
similar sample pairs stay close to each other while dissimilar ones are far apart, in our contrastive
learning approach we used a regression setting where the labels we used were calculated as the
fraction of annotators that answer "Yes" divided by six (the total number of annotators). In this
manner, our method takes into consideration the subjectivity and the diversity of views of the
task in the paradigm of learning from data with disagreements.</p>
      <p>We also leverage the diferent annotators’ labeling for making the final predictions, hence we
cast the Sexism identification task as a regression task that predicts the fraction of annotators
that indicated "YES". To derive the hard label, we use a rule where the model prediction has to
be greater than 0.5 to predict "YES" and "NO" otherwise.</p>
      <p>The following summarizes our submitted systems:
1. FT: fine-tuning of the language model for a maximum of 30 epochs with early stopping.</p>
      <p>Listed on the oficial leaderboard as "CIC-SDS.KN_1"
2. Freeze_CL: we added contrastive learning before fine-tuning the model and freezing
the model to only train the classifier head. Listed on the oficial leaderboard as
"CICSDS.KN_2"
3. Unfreeze_CL: the same as the second run except that the fine-tuning step updates all of
the model parameters. Listed on the oficial leaderboard as "CIC-SDS.KN_3"</p>
      <p>
        We train with the contrastive learning objective for 10 epochs with a learning rate of 5 − 5
and a batch size of 32, we follow mainly the settings reported in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In the subsequent
finetuning step we train for a maximum of 20 epochs (in order to have a comparable setting with
the FT setting with 30 epochs) with early stopping, learning rate of 1 − 5, and batch size of
128. We use the AdamW optimizer [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We use the transformers library [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to train our models
on an NVIDIA V100 GPU with 32GB memory. We save the model with the lowest root mean
square error (rmse) score on the validation set during training. We then use the saved model to
make predictions on the unseen test set.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>
        Our systems ranked among the top-3 teams according to the normalized ICM metric (Information
Contrast Measure). The ICM metric [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a similarity function that generalizes Pointwise Mutual
Information (PMI) to compute the similarity between a model’s output and the ground truth
categories. To calculate the normalized ICM, the "Minority class" baseline (that classifies all
instances as the minority class) is considered the lowest score (i.e., 0) and the "Gold standard" is
considered the highest score (i.e., 1).
      </p>
      <p>Additionally, the models of sexism identification provided two types of outputs, "Hard" labels
that classify samples into sexist or not-sexist and "Soft" labels that specify a value between 0
and 1 in order to measure "the degree of sexism" involved in the sample. These labels were used
to evaluate the models across three schemes, described as follows:
• Hard-hard evaluation: the ICM similarity between the hard system output and the
hard ground truth
• Soft-soft evaluation: the ICM similarity between the soft system output and the soft
ground truth
• Hard-soft evaluation: the ICM similarity between the hard system output and the soft
ground truth</p>
      <p>A summary of our experiments is presented in Table 2 where we use the Hard-Hard, Soft-Soft,
and Hard-Soft evaluation schemes to compare our model results with those obtained by the
baseline "majority_class" that classifies all instances as the majority class, and the best models
submitted to Exist2023-task-1 (which are publicly available in the original leaderboard) in the
competition, which we refer to as the "best score" and correspond to the score obtained by the
model with the highest performance in that specific evaluation. We also provide results for
Spanish only, English only, and both Spanish and English.</p>
      <sec id="sec-4-1">
        <title>4.1. Analysis of results</title>
        <p>Our results show clearly that our systems with contrastive learning (Freeze_CL and Unfreeze_CL)
perform better than the just fine-tuning model across all evaluation schemes and language
slices for Spanish and English, hence demonstrating that the addition of contrastive learning as
an intermediate step benefits the model’s ability to correctly identify sexist content. Specifically,
between our two approaches for contrastive learning, the "Freeze" is slightly better than the
"Unfreeze" model, showing that the knowledge gained by the contrastive learning step is not
forgotten to a big extent by updating the previously learned parameters. With respect to the
baseline "majority class", its weakness is quite evident and also non-informative, but it suggests
how complex the task is without proper modeling of the phenomenon. We also remark on the
outstanding performance obtained by the best models in the competition, which we refer to in
Table 2 as the "best score" for each evaluation individually, and obtained far superior scores
compared with our proposed models in some evaluation scenarios such as the Spanish hard-hard
evaluation. We hypothesize that this efect may be attributed to the multilingual nature of our
model, which ofers the advantage of utilizing a single model for multiple languages. However,
as we observed, a multilingual model may also show performance variation across languages.
We leave as future work the investigation of factors that are likely to explain the observed
variation.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Related works</title>
      <p>
        In recent years, NLP tasks promoting tolerance and respect, including Hate Speech detection [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
Stereotype identification [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and gender bias mitigation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], have gained significant popularity
and strong support within the NLP community. Among these tasks, the identification of Sexism
has emerged as a distinct field of investigation, evolving from being a subsection of hate
speech detection [10] primarily conducted in English, to a standalone task studied in multiple
languages such as French [11], Chinese [12], and even lesser-resource languages like Romanian
[13]. However, apart from the previous Exist initiatives [14, 15], which primarily concentrated
on English and Spanish datasets, there has been limited exploration of modeling and analyzing
sexism phenomena from a multilingual perspective.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Sexism continues to be a significant societal concern, gaining increased attention as social media
platforms play an ever-growing role in our lives. The need to address and mitigate sexism
on these platforms has become paramount. In light of this, our study focused on developing
efective multilingual sexism identification systems using contrastive learning. Our findings
demonstrate the superiority of our proposed systems, which incorporated contrastive learning
with and without updating learned parameters, over the traditional fine-tuning approach.</p>
      <p>The results obtained from our experiments exceeded the performance of solely fine-tuned
models and proved to be highly competitive compared to the best scores achieved in the
competition. This outcome underscores the value of exploring the integration of contrastive learning
techniques into the traditional pipelines to further advance the field of content moderation.
Moving forward, further exploration and refinement of contrastive learning approaches hold
the potential to enhance the accuracy and eficiency of sexism detection systems, leading to
more inclusive and equitable online spaces.</p>
      <sec id="sec-6-1">
        <title>6.1. Future work</title>
        <p>Further Research in this field holds exciting prospects. Firstly, we intend to extend the evaluation
of our contrastive learning approach to additional sexism datasets and explore its applicability in
related tasks such as hate speech detection. Secondly, a more comprehensive analysis is needed
to understand how language models handle the inherent subjectivity of the task, considering
varying perspectives from annotators with diverse socio-demographic profiles. Lastly, while
our participation in the binary classification task of Exist2023 was fruitful, we are eager to
investigate the potential application of our approach in a multiclass setting. These avenues of
exploration promise to deepen our understanding and improve the efectiveness of multilingual
sexism identification systems.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The work was done with partial support from the Mexican Government through the grant
A1-S-47854 of CONACYT, Mexico, grants 20232138, 20232080, 20231567 of the Secretaría de
Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors thank the
CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje
Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE,
Mexico and acknowledge the support of Microsoft through the Microsoft Latin America PhD
Award.
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,
2019, pp. 1630–1640.
[10] V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F. M. R. Pardo, P. Rosso, M. Sanguinetti,
Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women
in twitter, in: Proceedings of the 13th international workshop on semantic evaluation,
2019, pp. 54–63.
[11] P. Chiril, V. Moriceau, F. Benamara, A. Mari, G. Origgi, M. Coulomb-Gully, He said “who’s
gonna take care of your children when you are at acl?”: Reported sexist acts are not
sexist, in: Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics, 2020, pp. 4055–4066.
[12] A. Jiang, X. Yang, Y. Liu, A. Zubiaga, Swsr: A chinese dataset and lexicon for online sexism
detection, Online Social Networks and Media 27 (2022) 100182.
[13] A. Moldovan, K. Csürös, A.-M. Bucur, L. Bercuci, Users hate blondes: Detecting sexism
in user comments on online romanian news, in: Proceedings of the Sixth Workshop on
Online Abuse and Harms (WOAH), 2022, pp. 230–230.
[14] F. Rodríguez-Sánchez, J. Carrillo-de Albornoz, L. Plaza, J. Gonzalo, P. Rosso, M. Comet,
T. Donoso, Overview of exist 2021: sexism identification in social networks, Procesamiento
del Lenguaje Natural 67 (2021) 195–207.
[15] F. Rodríguez-Sánchez, J. Carrillo-de Albornoz, L. Plaza, A. Mendieta-Aragón, G.
MarcoRemón, M. Makeienko, M. Plaza, J. Gonzalo, D. Spina, P. Rosso, Overview of exist 2022:
sexism identification in social networks, Procesamiento del Lenguaje Natural 69 (2022)
229–240.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Morante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , Overview of exist 2023:
          <article-title>sexism identification in social networks</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>593</fpage>
          -
          <lpage>599</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>DeLucia</surname>
          </string-name>
          , S. Wu,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Aguirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Resnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dredze</surname>
          </string-name>
          ,
          <article-title>Bernice: A multilingual pre-trained encoder for Twitter</article-title>
          ,
          <source>in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>6191</fpage>
          -
          <lpage>6205</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          . emnlp-main.
          <volume>415</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Sedghamiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Raval</surname>
          </string-name>
          , E. Santus,
          <string-name>
            <given-names>T.</given-names>
            <surname>Alhanai</surname>
          </string-name>
          , M. Ghassemi, SupCL-Seq:
          <article-title>Supervised Contrastive Learning for downstream optimized sequence representations</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2021</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Punta Cana, Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>3398</fpage>
          -
          <lpage>3403</lpage>
          . URL: https:// aclanthology.org/
          <year>2021</year>
          .findings-emnlp.
          <volume>289</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .findings-emnlp.
          <volume>289</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>Decoupled weight decay regularization</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2019</year>
          . URL: https://openreview.net/forum?id=
          <fpage>Bkg6RiCqY7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Le</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art natural language processing</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-demos.6. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Delgado</surname>
          </string-name>
          ,
          <article-title>Evaluating extreme hierarchical multi-label classification, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>5809</fpage>
          -
          <lpage>5819</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Poletto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <article-title>Resources and benchmark corpora for hate speech detection: a systematic review</article-title>
          ,
          <source>Language Resources and Evaluation</source>
          <volume>55</volume>
          (
          <year>2021</year>
          )
          <fpage>477</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Cignarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paciello</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>D'Errico, Detecting racial stereotypes: An italian social media corpus where psychology meets nlp</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>60</volume>
          (
          <year>2023</year>
          )
          <fpage>103118</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gaut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          , M. ElSherief, J.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Mirza</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Belding</surname>
            ,
            <given-names>K.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>W. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Mitigating gender bias in natural language processing: Literature review</article-title>
          , in:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>